Re: Do you have an AMD Ryzen or Epyc CPU?
Hi, I applied your code on my AMD Ryzen 7 1700X. Below is the dmesg. I hope this helps, if you have any other AMD Ryzen related stuff that needs testing please let me know. Greetings Leo OpenBSD 6.3-current (GENERIC.MP) #2: Wed Jun 20 05:42:54 CEST 2018 batman@batdesk.wayne-enterprises.company:/usr/src/sys/arch/amd64/compile/GENERIC.MP real mem = 17106407424 (16313MB) avail mem = 16447160320 (15685MB) mpath0 at root scsibus0 at mpath0: 256 targets mainbus0 at root bios0 at mainbus0: SMBIOS rev. 3.0 @ 0xea850 (59 entries) bios0: vendor American Megatrends Inc. version "1.90" date 09/20/2017 bios0: Micro-Star International Co., Ltd. MS-7A32 acpi0 at bios0: rev 2 acpi0: sleep states S0 S3 S4 S5 acpi0: tables DSDT FACP APIC FPDT FIDT SSDT SSDT CRAT CDIT SSDT MCFG HPET SSDT UEFI IVRS SSDT SSDT acpi0: wakeup devices GPP0(S4) GPP1(S4) GPP2(S4) PTXH(S4) OBLW(S4) GPP3(S4) GPP4(S4) GPP5(S4) GPP6(S4) GPP7(S4) GPP8(S4) GPP9(S4) GPPA(S4) GPPB(S4) GPPC(S4) GPPD(S4) [...] acpitimer0 at acpi0: 3579545 Hz, 32 bits acpimadt0 at acpi0 addr 0xfee0: PC-AT compat cpu0 at mainbus0: apid 0 (boot processor) cpu0: AMD Ryzen 7 1700X Eight-Core Processor, 3400.51 MHz cpu0: FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,MMX,FXSR,SSE,SSE2,HTT,SSE3,PCLMUL,MWAIT,SSSE3,FMA3,CX16,SSE4.1,SSE4.2,MOVBE,POPCNT,AES,XSAVE,AVX,F16C,RDRAND,NXE,MMXX,FFXSR,PAGE1GB,RDTSCP,LONG,LAHF,CMPLEG,SVM,EAPICSP,AMCR8,ABM,SSE4A,MASSE,3DNOWP,OSVW,SKINIT,TCE,TOPEXT,CPCTR,DBKP,PCTRL3,MWAITX,ITSC,FSGSBASE,BMI1,AVX2,SMEP,BMI2,RDSEED,ADX,SMAP,CLFLUSHOPT,SHA,XSAVEOPT,XSAVEC,XGETBV1,XSAVES cpu0: 64KB 64b/line 4-way I-cache, 32KB 64b/line 8-way D-cache, 512KB 64b/line 8-way L2 cache, 16MB 64b/line 16-way L3 cache cpu0: ITLB 64 4KB entries fully associative, 64 4MB entries fully associative cpu0: DTLB 64 4KB entries fully associative, 64 4MB entries fully associative cpu0: eax 0x ebx 0x0100 ecx 0x cpu0: smt 0, core 0, package 0 mtrr: Pentium Pro MTRR support, 8 var ranges, 88 fixed ranges cpu0: apic clock running at 99MHz cpu0: mwait min=64, max=64, IBE cpu1 at mainbus0: apid 1 (application processor) cpu1: AMD Ryzen 7 1700X Eight-Core Processor, 3399.99 MHz cpu1: FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,MMX,FXSR,SSE,SSE2,HTT,SSE3,PCLMUL,MWAIT,SSSE3,FMA3,CX16,SSE4.1,SSE4.2,MOVBE,POPCNT,AES,XSAVE,AVX,F16C,RDRAND,NXE,MMXX,FFXSR,PAGE1GB,RDTSCP,LONG,LAHF,CMPLEG,SVM,EAPICSP,AMCR8,ABM,SSE4A,MASSE,3DNOWP,OSVW,SKINIT,TCE,TOPEXT,CPCTR,DBKP,PCTRL3,MWAITX,ITSC,FSGSBASE,BMI1,AVX2,SMEP,BMI2,RDSEED,ADX,SMAP,CLFLUSHOPT,SHA,XSAVEOPT,XSAVEC,XGETBV1,XSAVES cpu1: 64KB 64b/line 4-way I-cache, 32KB 64b/line 8-way D-cache, 512KB 64b/line 8-way L2 cache, 16MB 64b/line 16-way L3 cache cpu1: ITLB 64 4KB entries fully associative, 64 4MB entries fully associative cpu1: DTLB 64 4KB entries fully associative, 64 4MB entries fully associative cpu1: eax 0x0001 ebx 0x0100 ecx 0x cpu1: smt 0, core 1, package 0 cpu2 at mainbus0: apid 2 (application processor) cpu2: AMD Ryzen 7 1700X Eight-Core Processor, 3399.99 MHz cpu2: FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,MMX,FXSR,SSE,SSE2,HTT,SSE3,PCLMUL,MWAIT,SSSE3,FMA3,CX16,SSE4.1,SSE4.2,MOVBE,POPCNT,AES,XSAVE,AVX,F16C,RDRAND,NXE,MMXX,FFXSR,PAGE1GB,RDTSCP,LONG,LAHF,CMPLEG,SVM,EAPICSP,AMCR8,ABM,SSE4A,MASSE,3DNOWP,OSVW,SKINIT,TCE,TOPEXT,CPCTR,DBKP,PCTRL3,MWAITX,ITSC,FSGSBASE,BMI1,AVX2,SMEP,BMI2,RDSEED,ADX,SMAP,CLFLUSHOPT,SHA,XSAVEOPT,XSAVEC,XGETBV1,XSAVES cpu2: 64KB 64b/line 4-way I-cache, 32KB 64b/line 8-way D-cache, 512KB 64b/line 8-way L2 cache, 16MB 64b/line 16-way L3 cache cpu2: ITLB 64 4KB entries fully associative, 64 4MB entries fully associative cpu2: DTLB 64 4KB entries fully associative, 64 4MB entries fully associative cpu2: eax 0x0002 ebx 0x0101 ecx 0x cpu2: smt 0, core 2, package 0 cpu3 at mainbus0: apid 3 (application processor) cpu3: AMD Ryzen 7 1700X Eight-Core Processor, 3399.99 MHz cpu3: FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,MMX,FXSR,SSE,SSE2,HTT,SSE3,PCLMUL,MWAIT,SSSE3,FMA3,CX16,SSE4.1,SSE4.2,MOVBE,POPCNT,AES,XSAVE,AVX,F16C,RDRAND,NXE,MMXX,FFXSR,PAGE1GB,RDTSCP,LONG,LAHF,CMPLEG,SVM,EAPICSP,AMCR8,ABM,SSE4A,MASSE,3DNOWP,OSVW,SKINIT,TCE,TOPEXT,CPCTR,DBKP,PCTRL3,MWAITX,ITSC,FSGSBASE,BMI1,AVX2,SMEP,BMI2,RDSEED,ADX,SMAP,CLFLUSHOPT,SHA,XSAVEOPT,XSAVEC,XGETBV1,XSAVES cpu3: 64KB 64b/line 4-way I-cache, 32KB 64b/line 8-way D-cache, 512KB 64b/line 8-way L2 cache, 16MB 64b/line 16-way L3 cache cpu3: ITLB 64 4KB entries fully associative, 64 4MB entries fully associative cpu3: DTLB 64 4KB entries fully associative, 64 4MB entries fully associative cpu3: eax 0x0003 ebx 0x0101 ecx 0x cpu3: smt 0, core 3, package 0 cpu4 at mainbus0: apid 4 (application processor) cpu4: AMD Ryzen 7 1700X Eight-Core Processor, 3399.99 MHz cpu4:
iked support for IKEv2 Message Fragmentation (RFC 7383)
Hello tech@, My IKEv2 sessions are occasionally down due to transit networks dropping UDP fragments for one reason or another[1]. It happens frequently enough that I am considering implementing support for RFC 7383 in iked. Before I dig in, I feel that I should ask if anyone has already started on such work. If not, perhaps someone that is familiar with the code could suggest an approach at a high level? Thanks for any advice, -TimS [1] Whenver I've asked, the reason is usually something about DDoS prevention. -- Tim Stewart --- Mail: t...@stoo.org Matrix: @tim:stoo.org
Re: Do you have an AMD Ryzen or Epyc CPU?
On Tue, Jun 19, 2018 at 4:32 PM Mark Kettenis wrote: > Looking for people that can run a kernel with the patch below and mail > me the resulting dmesg. > > Thanks, > > Mark > > > Index: arch/amd64/amd64/identcpu.c > === > RCS file: /cvs/src/sys/arch/amd64/amd64/identcpu.c,v > retrieving revision 1.96 > diff -u -p -r1.96 identcpu.c > --- arch/amd64/amd64/identcpu.c 7 Jun 2018 04:07:28 - 1.96 > +++ arch/amd64/amd64/identcpu.c 19 Jun 2018 20:27:24 - > @@ -803,6 +803,11 @@ cpu_topology(struct cpu_info *ci) > /* Pkg id is the upper remaining bits */ > ci->ci_pkg_id = apicid & ~core_mask; > ci->ci_pkg_id >>= core_bits; > + if (ci->ci_pnfeatset >= 0x801e) { > + CPUID(0x801e, eax, ebx, ecx, edx); > + printf("cpu%d: eax 0x%08x ebx 0x%08x ecx 0x%08x\n", > + ci->ci_cpuid, eax, ebx, ecx); > + } > } else if (strcmp(cpu_vendor, "GenuineIntel") == 0) { > /* We only support leaf 1/4 detection */ > if (cpuid_level < 4) Hi Mark, My rig is running a Ryzen 7 1800x would that suffice for a test? Ethan > > >
Re: mandoc potential memory leak fix
Hi, Florian Obser wrote on Mon, Jun 18, 2018 at 06:54:49PM +0200: > On Mon, Jun 18, 2018 at 04:37:32PM +0200, Jan Schreiber wrote: >> this patch closes potential memory leaks in the mandoc memory >> wrapper functions and follows the examples in the manpages. > These are not leaks since mandoc exits via err(3) immediately after an > allocation failure. Which is the whole point of these wrappers I > imagine. Exactly, so please don't commit these changes, they make the code less readable for no benefit. Where these functions are used, the calling code typically already holds hundreds of pointers to allocated memory for the syntax tree that is in the process of being built. These functions save us from having to call functions that recursively free the syntax tree from each and every place where memory allocation can fail. Yours, Ingo
Do you have an AMD Ryzen or Epyc CPU?
Looking for people that can run a kernel with the patch below and mail me the resulting dmesg. Thanks, Mark Index: arch/amd64/amd64/identcpu.c === RCS file: /cvs/src/sys/arch/amd64/amd64/identcpu.c,v retrieving revision 1.96 diff -u -p -r1.96 identcpu.c --- arch/amd64/amd64/identcpu.c 7 Jun 2018 04:07:28 - 1.96 +++ arch/amd64/amd64/identcpu.c 19 Jun 2018 20:27:24 - @@ -803,6 +803,11 @@ cpu_topology(struct cpu_info *ci) /* Pkg id is the upper remaining bits */ ci->ci_pkg_id = apicid & ~core_mask; ci->ci_pkg_id >>= core_bits; + if (ci->ci_pnfeatset >= 0x801e) { + CPUID(0x801e, eax, ebx, ecx, edx); + printf("cpu%d: eax 0x%08x ebx 0x%08x ecx 0x%08x\n", + ci->ci_cpuid, eax, ebx, ecx); + } } else if (strcmp(cpu_vendor, "GenuineIntel") == 0) { /* We only support leaf 1/4 detection */ if (cpuid_level < 4)
Re: ospfd: deal with /etc/netstart, changes of netmask and dest_addr
On Tue, Jun 19, 2018 at 03:59:24PM +0100, Stuart Henderson wrote: > On 2018/06/18 08:53, Remi Locherer wrote: > > Index: ospfd.h > > === > > RCS file: /cvs/src/usr.sbin/ospfd/ospfd.h,v > > retrieving revision 1.100 > > diff -u -p -r1.100 ospfd.h > > --- ospfd.h 11 Feb 2018 02:27:33 - 1.100 > > +++ ospfd.h 12 Jun 2018 20:41:43 - > ... > > -struct ifaddrdel { > > +struct ifaddr { > > struct in_addr addr; > > + struct in_addr mask; > > + struct in_addr dst; > > unsigned intifindex; > > }; > > I think it would be better to use a different name. > > Even if there's no actual conflict as if_var.h isn't pulled in, this is > renaming this struct to "struct ifaddr" is at least confusing. Oh yes, that is unfortunate. What would be a better name for this struct? iface_addr maybe? Or if_addrmask?
Re: On the matter of OpenBSD breaking embargos (KRACK)
On Tue, 19 Jun 2018 09:30:05 -0600 > > > > > ps. Disable Intel Hyper-Threading where not needed, until we all > > > know more. > > > > Is it safer to use bsd.sp for the time being? > > No, a better solution is coming. And in snapshots already. Thankyou for enabling us to patch ASAP.
Re: [patch] Add omitting period for ktrace(2) manual
On Tue, Jun 19, 2018 at 04:22:26PM +0800, Nan Xiao wrote: > Hi tech@, > > To be consistent with other trace points, this patch adds omitting > period for KTRFAC_STRUCT, thanks! > fixed, thanks. jmc > Index: ktrace.2 > === > RCS file: /cvs/src/lib/libc/sys/ktrace.2,v > retrieving revision 1.35 > diff -u -p -r1.35 ktrace.2 > --- ktrace.2 28 Nov 2017 06:03:41 - 1.35 > +++ ktrace.2 19 Jun 2018 08:10:38 - > @@ -108,7 +108,7 @@ Trace all I/O > .It Dv KTRFAC_PSIG > Trace posted signals. > .It Dv KTRFAC_STRUCT > -Trace various structs > +Trace various structs. > .It Dv KTRFAC_USER > Trace user data coming from > .Xr utrace 2 > > -- > Best Regards > Nan Xiao >
Re: Unlock sendit-based syscalls
On Tue, Jun 19, 2018 at 03:58:51PM +0200, Mark Kettenis wrote: > > Date: Tue, 19 Jun 2018 15:38:01 +0200 > > From: Martin Pieuchot > > > > On 19/06/18(Tue) 14:55, Mark Kettenis wrote: > > > > To avoid races with another thread that might be clearing our pointer > > > > in `fd_ofiles', we need more than atomic operations. For that we need > > > > to serialize the threads. The most simple way to do so is with a mutex > > > > on a different data structure. Either global, like in my diff, or as > > > > visa@ suggested in the corresponding `fdp'. > > > > > > Right. Another case of trying to take a reference without holding one > > > already. The trick here is to use the fact that as long as there is a > > > file descriptor allocated for this open file the reference count is at > > > least 1. So in that case taking a reference is safe. Your global > > > mutex does indeed do the trick. But why aren't you using the file > > > descriptor table rwlock for this purpose? Is that because there is a > > > lock ordering problem between the kernel lock and that rwlock? > > > > I have two reasons. First I don't want to introduce new sleeping points, > > secondly I want this mutex to disappear. IMHO extending the scope of a > > lock is going in the wrong direction because then we'll want to split it. > > That's why I'm happy that our discussion made visa@ look at improving this. > > I wouldn't call this extending the scope of the lock. But yes, it > might cause unnecessary sleeps if a write lock is held for the purpose > of opening a file. The mutex that visa@ proposes trades that in for > potential contion in the multiple-readers case. Below is a revised version of the diff. Like mpi@'s diff, it uses a `fhdlk' mutex for serializing access to the file list. However, that lock is not used for anything else. Each file descriptor table has a dedicated mutex that allows fd_getfile() acquire a file reference safely. fd_iterfile() uses a compare-and-swap sequence for reference acquisition. The sequence ensures that once `f_count' has become zero the value is never raised. This diff does not unlock any system calls. Index: kern/kern_descrip.c === RCS file: src/sys/kern/kern_descrip.c,v retrieving revision 1.166 diff -u -p -r1.166 kern_descrip.c --- kern/kern_descrip.c 18 Jun 2018 09:15:05 - 1.166 +++ kern/kern_descrip.c 19 Jun 2018 15:13:05 - @@ -66,7 +66,11 @@ /* * Descriptor management. + * + * We need to block interrupts as long as `fhdlk' is being taken + * with and without the KERNEL_LOCK(). */ +struct mutex fhdlk = MUTEX_INITIALIZER(IPL_MPFLOOR); struct filelist filehead; /* head of list of open files */ int numfiles; /* actual number of open files */ @@ -194,17 +198,25 @@ struct file * fd_iterfile(struct file *fp, struct proc *p) { struct file *nfp; + unsigned int count; + mtx_enter(); if (fp == NULL) nfp = LIST_FIRST(); else nfp = LIST_NEXT(fp, f_list); /* don't FREF when f_count == 0 to avoid race in fdrop() */ - while (nfp != NULL && nfp->f_count == 0) - nfp = LIST_NEXT(nfp, f_list); - if (nfp != NULL) - FREF(nfp); + while (nfp != NULL) { + count = nfp->f_count; + if (count == 0) { + nfp = LIST_NEXT(nfp, f_list); + continue; + } + if (atomic_cas_uint(>f_count, count, count + 1) == count) + break; + } + mtx_leave(); if (fp != NULL) FRELE(fp, p); @@ -217,10 +229,17 @@ fd_getfile(struct filedesc *fdp, int fd) { struct file *fp; - if ((u_int)fd >= fdp->fd_nfiles || (fp = fdp->fd_ofiles[fd]) == NULL) + vfs_stall_barrier(); + + if ((u_int)fd >= fdp->fd_nfiles) return (NULL); - FREF(fp); + mtx_enter(>fd_fplock); + fp = fdp->fd_ofiles[fd]; + if (fp != NULL) + atomic_inc_int(>f_count); + mtx_leave(>fd_fplock); + return (fp); } @@ -639,7 +658,7 @@ finishdup(struct proc *p, struct file *f fdpassertlocked(fdp); KASSERT(fp->f_iflags & FIF_INSERTED); - if (fp->f_count == LONG_MAX-2) { + if (fp->f_count >= FDUP_MAX_COUNT) { FRELE(fp, p); return (EDEADLK); } @@ -653,8 +672,10 @@ finishdup(struct proc *p, struct file *f fd_used(fdp, new); } + mtx_enter(>fd_fplock); fdp->fd_ofiles[new] = fp; fdp->fd_ofileflags[new] = fdp->fd_ofileflags[old] & ~UF_EXCLOSE; + mtx_leave(>fd_fplock); *retval = new; if (oldfp != NULL) { @@ -671,22 +692,34 @@ fdinsert(struct filedesc *fdp, int fd, i struct file *fq; fdpassertlocked(fdp); + + mtx_enter(); + fp->f_iflags |=
Re: On the matter of OpenBSD breaking embargos (KRACK)
Kevin Chadwick wrote: > On Fri, 15 Jun 2018 17:28:14 -0600 > > > > ps. Disable Intel Hyper-Threading where not needed, until we all know > > more. > > Is it safer to use bsd.sp for the time being? No, a better solution is coming. And in snapshots already.
Re: On the matter of OpenBSD breaking embargos (KRACK)
On Fri, 15 Jun 2018 17:28:14 -0600 > ps. Disable Intel Hyper-Threading where not needed, until we all know > more. Is it safer to use bsd.sp for the time being?
Re: ospfd: deal with /etc/netstart, changes of netmask and dest_addr
On 2018/06/18 08:53, Remi Locherer wrote: > Index: ospfd.h > === > RCS file: /cvs/src/usr.sbin/ospfd/ospfd.h,v > retrieving revision 1.100 > diff -u -p -r1.100 ospfd.h > --- ospfd.h 11 Feb 2018 02:27:33 - 1.100 > +++ ospfd.h 12 Jun 2018 20:41:43 - ... > -struct ifaddrdel { > +struct ifaddr { > struct in_addr addr; > + struct in_addr mask; > + struct in_addr dst; > unsigned intifindex; > }; I think it would be better to use a different name. Even if there's no actual conflict as if_var.h isn't pulled in, this is renaming this struct to "struct ifaddr" is at least confusing.
Re: Prevent races w/ f_data
> Date: Tue, 19 Jun 2018 15:45:48 +0200 > From: Martin Pieuchot > > On 19/06/18(Tue) 15:08, Mark Kettenis wrote: > > > Date: Tue, 19 Jun 2018 12:51:35 +0200 > > > From: Martin Pieuchot > > > > > > There's one place in our kernel where `f_data' is overwritten while the > > > descriptor sits in multiple shared data structures. It is in diskmap. > > > > > > We want to avoid this situation to be able to treat f_data as immutable. > > > > > > So the diff below remove `fp' from the shared data structures before it > > > gets modified. It is not a complete solution to the reference grabbing > > > problem of vnode. There's still a window where a race is possible > > > between the moment where we release the mutex and increase the vnode's > > > reference. The KERNEL_LOCK() is what protects us for now. However this > > > problem is the next one to solve to be able to unlock syscalls messing > > > with vnodes. > > > > > > Still this diff is a step towards that. > > > > > > Note that this change only apply to vnodes, so it doesn't matter for > > > socket-related syscalls. > > > > > > Ok? > > > > I think this is the wrong approach. Instead of modifying the struct > > file, this code should create a new struct file. Then replace the > > pointer in the file descriptor table with that new file and then drop > > the reference to the old struct file. That way f_data *is* immutable. > > I tried that first... well it didn't work for me. > > The problem is that once you swapped `fp' you still need to free the old > one. But you don't want to close the associated file, so you end up > NULLing `f_data' before calling closef()... You simply call FRELE() on the old struct file. That should properly close the file once the last reference goes. Obviously you'd remove the code that closes the original vnode from diskmapioctl(). I don't see how closef() would be called in that scenario, since you're not releasing the file descriptor. > On top of that this approach introduces a new error condition if we > reach `maxfiles', yes this is a degenerated case but I'd prefer not > have to deal with it. True. And it changes the semantics of the DIOCMAP ioctl. Currently, if you call DIOCMAP on a dup'ed file descriptor it would change the open file associated with all of them. With my suggestion, it will only change the open file for the file descriptor you're operating on. The other file descriptor would continue to refer to the original open file (typically the diskmap device, but not necessarily). I'd argue that is actually how this ioctl should behave. In fact the current behaviour is a bit of a security risk. I could open a file, pass the descriptor to another process and then call DIOCMAP and suddenly the other process has a ilfe descriptor that refers to a disk device instead of a normal file. > The approach below returns `fp' into the "larval" or "reserved" state, > change it, then insert it again. > > > > > > Index: dev/diskmap.c > > > === > > > RCS file: /cvs/src/sys/dev/diskmap.c,v > > > retrieving revision 1.20 > > > diff -u -p -r1.20 diskmap.c > > > --- dev/diskmap.c 9 May 2018 08:42:02 - 1.20 > > > +++ dev/diskmap.c 19 Jun 2018 10:38:37 - > > > @@ -59,7 +59,7 @@ diskmapioctl(dev_t dev, u_long cmd, cadd > > > struct filedesc *fdp = p->p_fd; > > > struct file *fp = NULL; > > > struct vnode *vp = NULL, *ovp; > > > - char *devname; > > > + char *devname, flags; > > > int fd, error = EINVAL; > > > > > > if (cmd != DIOCMAP) > > > @@ -106,6 +106,15 @@ diskmapioctl(dev_t dev, u_long cmd, cadd > > > vput(ovp); > > > } > > > > > > + /* Remove the file while we're modifying it to prevent races. */ > > > + flags = fdp->fd_ofileflags[fd]; > > > + mtx_enter(); > > > + fdp->fd_ofiles[fd] = NULL; > > > + fdp->fd_ofileflags[fd] = 0; > > > + if (fp->f_iflags & FIF_INSERTED) > > > + LIST_REMOVE(fp, f_list); > > > + mtx_leave(); > > > + > > > fp->f_data = (caddr_t)vp; > > > fp->f_offset = 0; > > > mtx_enter(>f_mtx); > > > @@ -115,6 +124,9 @@ diskmapioctl(dev_t dev, u_long cmd, cadd > > > fp->f_rbytes = 0; > > > fp->f_wbytes = 0; > > > mtx_leave(>f_mtx); > > > + > > > + /* Insert it back where it was. */ > > > + fdinsert(fdp, fd, flags, fp); > > > > > > VOP_UNLOCK(vp); > > > > > > > > > > >
Re: Unlock sendit-based syscalls
> Date: Tue, 19 Jun 2018 15:38:01 +0200 > From: Martin Pieuchot > > On 19/06/18(Tue) 14:55, Mark Kettenis wrote: > > > To avoid races with another thread that might be clearing our pointer > > > in `fd_ofiles', we need more than atomic operations. For that we need > > > to serialize the threads. The most simple way to do so is with a mutex > > > on a different data structure. Either global, like in my diff, or as > > > visa@ suggested in the corresponding `fdp'. > > > > Right. Another case of trying to take a reference without holding one > > already. The trick here is to use the fact that as long as there is a > > file descriptor allocated for this open file the reference count is at > > least 1. So in that case taking a reference is safe. Your global > > mutex does indeed do the trick. But why aren't you using the file > > descriptor table rwlock for this purpose? Is that because there is a > > lock ordering problem between the kernel lock and that rwlock? > > I have two reasons. First I don't want to introduce new sleeping points, > secondly I want this mutex to disappear. IMHO extending the scope of a > lock is going in the wrong direction because then we'll want to split it. > That's why I'm happy that our discussion made visa@ look at improving this. I wouldn't call this extending the scope of the lock. But yes, it might cause unnecessary sleeps if a write lock is held for the purpose of opening a file. The mutex that visa@ proposes trades that in for potential contion in the multiple-readers case.
Re: Prevent races w/ f_data
On 19/06/18(Tue) 15:08, Mark Kettenis wrote: > > Date: Tue, 19 Jun 2018 12:51:35 +0200 > > From: Martin Pieuchot > > > > There's one place in our kernel where `f_data' is overwritten while the > > descriptor sits in multiple shared data structures. It is in diskmap. > > > > We want to avoid this situation to be able to treat f_data as immutable. > > > > So the diff below remove `fp' from the shared data structures before it > > gets modified. It is not a complete solution to the reference grabbing > > problem of vnode. There's still a window where a race is possible > > between the moment where we release the mutex and increase the vnode's > > reference. The KERNEL_LOCK() is what protects us for now. However this > > problem is the next one to solve to be able to unlock syscalls messing > > with vnodes. > > > > Still this diff is a step towards that. > > > > Note that this change only apply to vnodes, so it doesn't matter for > > socket-related syscalls. > > > > Ok? > > I think this is the wrong approach. Instead of modifying the struct > file, this code should create a new struct file. Then replace the > pointer in the file descriptor table with that new file and then drop > the reference to the old struct file. That way f_data *is* immutable. I tried that first... well it didn't work for me. The problem is that once you swapped `fp' you still need to free the old one. But you don't want to close the associated file, so you end up NULLing `f_data' before calling closef()... On top of that this approach introduces a new error condition if we reach `maxfiles', yes this is a degenerated case but I'd prefer not have to deal with it. The approach below returns `fp' into the "larval" or "reserved" state, change it, then insert it again. > > > Index: dev/diskmap.c > > === > > RCS file: /cvs/src/sys/dev/diskmap.c,v > > retrieving revision 1.20 > > diff -u -p -r1.20 diskmap.c > > --- dev/diskmap.c 9 May 2018 08:42:02 - 1.20 > > +++ dev/diskmap.c 19 Jun 2018 10:38:37 - > > @@ -59,7 +59,7 @@ diskmapioctl(dev_t dev, u_long cmd, cadd > > struct filedesc *fdp = p->p_fd; > > struct file *fp = NULL; > > struct vnode *vp = NULL, *ovp; > > - char *devname; > > + char *devname, flags; > > int fd, error = EINVAL; > > > > if (cmd != DIOCMAP) > > @@ -106,6 +106,15 @@ diskmapioctl(dev_t dev, u_long cmd, cadd > > vput(ovp); > > } > > > > + /* Remove the file while we're modifying it to prevent races. */ > > + flags = fdp->fd_ofileflags[fd]; > > + mtx_enter(); > > + fdp->fd_ofiles[fd] = NULL; > > + fdp->fd_ofileflags[fd] = 0; > > + if (fp->f_iflags & FIF_INSERTED) > > + LIST_REMOVE(fp, f_list); > > + mtx_leave(); > > + > > fp->f_data = (caddr_t)vp; > > fp->f_offset = 0; > > mtx_enter(>f_mtx); > > @@ -115,6 +124,9 @@ diskmapioctl(dev_t dev, u_long cmd, cadd > > fp->f_rbytes = 0; > > fp->f_wbytes = 0; > > mtx_leave(>f_mtx); > > + > > + /* Insert it back where it was. */ > > + fdinsert(fdp, fd, flags, fp); > > > > VOP_UNLOCK(vp); > > > > > >
Re: Unlock sendit-based syscalls
On 19/06/18(Tue) 14:55, Mark Kettenis wrote: > > To avoid races with another thread that might be clearing our pointer > > in `fd_ofiles', we need more than atomic operations. For that we need > > to serialize the threads. The most simple way to do so is with a mutex > > on a different data structure. Either global, like in my diff, or as > > visa@ suggested in the corresponding `fdp'. > > Right. Another case of trying to take a reference without holding one > already. The trick here is to use the fact that as long as there is a > file descriptor allocated for this open file the reference count is at > least 1. So in that case taking a reference is safe. Your global > mutex does indeed do the trick. But why aren't you using the file > descriptor table rwlock for this purpose? Is that because there is a > lock ordering problem between the kernel lock and that rwlock? I have two reasons. First I don't want to introduce new sleeping points, secondly I want this mutex to disappear. IMHO extending the scope of a lock is going in the wrong direction because then we'll want to split it. That's why I'm happy that our discussion made visa@ look at improving this.
Re: Prevent races w/ f_data
> Date: Tue, 19 Jun 2018 12:51:35 +0200 > From: Martin Pieuchot > > There's one place in our kernel where `f_data' is overwritten while the > descriptor sits in multiple shared data structures. It is in diskmap. > > We want to avoid this situation to be able to treat f_data as immutable. > > So the diff below remove `fp' from the shared data structures before it > gets modified. It is not a complete solution to the reference grabbing > problem of vnode. There's still a window where a race is possible > between the moment where we release the mutex and increase the vnode's > reference. The KERNEL_LOCK() is what protects us for now. However this > problem is the next one to solve to be able to unlock syscalls messing > with vnodes. > > Still this diff is a step towards that. > > Note that this change only apply to vnodes, so it doesn't matter for > socket-related syscalls. > > Ok? I think this is the wrong approach. Instead of modifying the struct file, this code should create a new struct file. Then replace the pointer in the file descriptor table with that new file and then drop the reference to the old struct file. That way f_data *is* immutable. > Index: dev/diskmap.c > === > RCS file: /cvs/src/sys/dev/diskmap.c,v > retrieving revision 1.20 > diff -u -p -r1.20 diskmap.c > --- dev/diskmap.c 9 May 2018 08:42:02 - 1.20 > +++ dev/diskmap.c 19 Jun 2018 10:38:37 - > @@ -59,7 +59,7 @@ diskmapioctl(dev_t dev, u_long cmd, cadd > struct filedesc *fdp = p->p_fd; > struct file *fp = NULL; > struct vnode *vp = NULL, *ovp; > - char *devname; > + char *devname, flags; > int fd, error = EINVAL; > > if (cmd != DIOCMAP) > @@ -106,6 +106,15 @@ diskmapioctl(dev_t dev, u_long cmd, cadd > vput(ovp); > } > > + /* Remove the file while we're modifying it to prevent races. */ > + flags = fdp->fd_ofileflags[fd]; > + mtx_enter(); > + fdp->fd_ofiles[fd] = NULL; > + fdp->fd_ofileflags[fd] = 0; > + if (fp->f_iflags & FIF_INSERTED) > + LIST_REMOVE(fp, f_list); > + mtx_leave(); > + > fp->f_data = (caddr_t)vp; > fp->f_offset = 0; > mtx_enter(>f_mtx); > @@ -115,6 +124,9 @@ diskmapioctl(dev_t dev, u_long cmd, cadd > fp->f_rbytes = 0; > fp->f_wbytes = 0; > mtx_leave(>f_mtx); > + > + /* Insert it back where it was. */ > + fdinsert(fdp, fd, flags, fp); > > VOP_UNLOCK(vp); > > >
Re: Unlock sendit-based syscalls
> Date: Tue, 19 Jun 2018 10:54:21 +0200 > From: Martin Pieuchot > > On 18/06/18(Mon) 18:18, Mark Kettenis wrote: > > > Date: Mon, 18 Jun 2018 11:24:00 +0200 > > > From: Martin Pieuchot > > > > > > Diff below is the last of the serie to remove the KERNEL_LOCK() from > > > sendto(2) and sendmsg(2) for sockets protected by the NET_LOCK(). > > > > > > As explained previously [0] this diff uses a simple non-intrusive > > > approached based on a single mutex. I'd like to get this in before > > > doing any refactoring of this layer to get real test coverages. > > > > > > [0] https://marc.info/?l=openbsd-tech=152699644924165=2 > > > > > > Tests? Comments? Oks? > > > > At first sight this diff seems to mess up the locking quite a bit. > > I'm sure that is intentional, but still I'm not sure this is the right > > direction. In my mind we should at some point in the future be able > > to use atomic operations to manage the reference count on struct file > > (f_count). Using atomic operations has the huge benefit of not having > > lock ordering problems! But it looks as if this diff gets us further > > away from that goal. > > > > You'll still need a lock for the global list of course. It is easy to > > protect insertions and removals. The real problem is indeed > > fd_iterfile(). There you want to take a reference but that isn't > > really safe since you don't necessarily hold one already. That's why > > we have that f_count == 0 check there. And as long as you're holding > > the lock that protects the list you know the struct file is not going > > to disappear. So if f_count > 0 you can safely take a reference. But > > calling FREF while holding a mutex is problematic because of > > vfs_stall_barrier(). You're correct that vfs_stall_barrier() isn't > > required here. All you need to do is atomically increase the > > reference count. So this one is indeed a bit special. And using a > > mutex to protect the reference counts could still work here. > > I agree with your analyse. My plan was a bit different. I wanted to > properly track references of elements on the list. That means that > kern_sysctl() could now be freeing a struct file, which shouldn't be > a problem. > > > What I don't understand is why you also protect the manipulation of > > the file descriptor tables with the mutex in some places. Those > > tables are already protected by a lock! > > That's the most important part of this diff. My goal is to ensure > that a descriptor returned by fd_getfile() is and will stay valid > during the whole execution of an unlocked syscall. The file descriptor doesn't have to stay valid (i.e. visible). Only the struct file associated with it. But I guess that is what you mean. > To avoid races with another thread that might be clearing our pointer > in `fd_ofiles', we need more than atomic operations. For that we need > to serialize the threads. The most simple way to do so is with a mutex > on a different data structure. Either global, like in my diff, or as > visa@ suggested in the corresponding `fdp'. Right. Another case of trying to take a reference without holding one already. The trick here is to use the fact that as long as there is a file descriptor allocated for this open file the reference count is at least 1. So in that case taking a reference is safe. Your global mutex does indeed do the trick. But why aren't you using the file descriptor table rwlock for this purpose? Is that because there is a lock ordering problem between the kernel lock and that rwlock? > > All in all, I think that using atomic operations for the reference > > count would really help. > > I agree, I just wanted to keep the logic easy in a first step. It > shouldn't be hard to split the mutex I'm using. The thing I'm a little bit worried about is that by inlining part of FREF() in various places it will be harder to eventually use atomic reference counts. On the other hand those cases serve as a marker for the spots where the reference counting needs a little bit more attention. I'd like to have an answer to my question about the rwlock. But if the answer is indeed that there is a rwlock, I think this diff is good to go. > > > Index: kern/kern_descrip.c > > > === > > > RCS file: /cvs/src/sys/kern/kern_descrip.c,v > > > retrieving revision 1.166 > > > diff -u -p -r1.166 kern_descrip.c > > > --- kern/kern_descrip.c 18 Jun 2018 09:15:05 - 1.166 > > > +++ kern/kern_descrip.c 18 Jun 2018 09:16:20 - > > > @@ -66,7 +66,11 @@ > > > > > > /* > > > * Descriptor management. > > > + * > > > + * We need to block interrupts as long as `fhdlk' is being taken > > > + * with and without the KERNEL_LOCK(). > > > */ > > > +struct mutex fhdlk = MUTEX_INITIALIZER(IPL_MPFLOOR); > > > struct filelist filehead;/* head of list of open files */ > > > int numfiles;/* actual number of open files
Prevent races w/ f_data
There's one place in our kernel where `f_data' is overwritten while the descriptor sits in multiple shared data structures. It is in diskmap. We want to avoid this situation to be able to treat f_data as immutable. So the diff below remove `fp' from the shared data structures before it gets modified. It is not a complete solution to the reference grabbing problem of vnode. There's still a window where a race is possible between the moment where we release the mutex and increase the vnode's reference. The KERNEL_LOCK() is what protects us for now. However this problem is the next one to solve to be able to unlock syscalls messing with vnodes. Still this diff is a step towards that. Note that this change only apply to vnodes, so it doesn't matter for socket-related syscalls. Ok? Index: dev/diskmap.c === RCS file: /cvs/src/sys/dev/diskmap.c,v retrieving revision 1.20 diff -u -p -r1.20 diskmap.c --- dev/diskmap.c 9 May 2018 08:42:02 - 1.20 +++ dev/diskmap.c 19 Jun 2018 10:38:37 - @@ -59,7 +59,7 @@ diskmapioctl(dev_t dev, u_long cmd, cadd struct filedesc *fdp = p->p_fd; struct file *fp = NULL; struct vnode *vp = NULL, *ovp; - char *devname; + char *devname, flags; int fd, error = EINVAL; if (cmd != DIOCMAP) @@ -106,6 +106,15 @@ diskmapioctl(dev_t dev, u_long cmd, cadd vput(ovp); } + /* Remove the file while we're modifying it to prevent races. */ + flags = fdp->fd_ofileflags[fd]; + mtx_enter(); + fdp->fd_ofiles[fd] = NULL; + fdp->fd_ofileflags[fd] = 0; + if (fp->f_iflags & FIF_INSERTED) + LIST_REMOVE(fp, f_list); + mtx_leave(); + fp->f_data = (caddr_t)vp; fp->f_offset = 0; mtx_enter(>f_mtx); @@ -115,6 +124,9 @@ diskmapioctl(dev_t dev, u_long cmd, cadd fp->f_rbytes = 0; fp->f_wbytes = 0; mtx_leave(>f_mtx); + + /* Insert it back where it was. */ + fdinsert(fdp, fd, flags, fp); VOP_UNLOCK(vp);
Re: Unlock sendit-based syscalls
On 18/06/18(Mon) 18:18, Mark Kettenis wrote: > > Date: Mon, 18 Jun 2018 11:24:00 +0200 > > From: Martin Pieuchot > > > > Diff below is the last of the serie to remove the KERNEL_LOCK() from > > sendto(2) and sendmsg(2) for sockets protected by the NET_LOCK(). > > > > As explained previously [0] this diff uses a simple non-intrusive > > approached based on a single mutex. I'd like to get this in before > > doing any refactoring of this layer to get real test coverages. > > > > [0] https://marc.info/?l=openbsd-tech=152699644924165=2 > > > > Tests? Comments? Oks? > > At first sight this diff seems to mess up the locking quite a bit. > I'm sure that is intentional, but still I'm not sure this is the right > direction. In my mind we should at some point in the future be able > to use atomic operations to manage the reference count on struct file > (f_count). Using atomic operations has the huge benefit of not having > lock ordering problems! But it looks as if this diff gets us further > away from that goal. > > You'll still need a lock for the global list of course. It is easy to > protect insertions and removals. The real problem is indeed > fd_iterfile(). There you want to take a reference but that isn't > really safe since you don't necessarily hold one already. That's why > we have that f_count == 0 check there. And as long as you're holding > the lock that protects the list you know the struct file is not going > to disappear. So if f_count > 0 you can safely take a reference. But > calling FREF while holding a mutex is problematic because of > vfs_stall_barrier(). You're correct that vfs_stall_barrier() isn't > required here. All you need to do is atomically increase the > reference count. So this one is indeed a bit special. And using a > mutex to protect the reference counts could still work here. I agree with your analyse. My plan was a bit different. I wanted to properly track references of elements on the list. That means that kern_sysctl() could now be freeing a struct file, which shouldn't be a problem. > What I don't understand is why you also protect the manipulation of > the file descriptor tables with the mutex in some places. Those > tables are already protected by a lock! That's the most important part of this diff. My goal is to ensure that a descriptor returned by fd_getfile() is and will stay valid during the whole execution of an unlocked syscall. To avoid races with another thread that might be clearing our pointer in `fd_ofiles', we need more than atomic operations. For that we need to serialize the threads. The most simple way to do so is with a mutex on a different data structure. Either global, like in my diff, or as visa@ suggested in the corresponding `fdp'. > All in all, I think that using atomic operations for the reference > count would really help. I agree, I just wanted to keep the logic easy in a first step. It shouldn't be hard to split the mutex I'm using. > > Index: kern/kern_descrip.c > > === > > RCS file: /cvs/src/sys/kern/kern_descrip.c,v > > retrieving revision 1.166 > > diff -u -p -r1.166 kern_descrip.c > > --- kern/kern_descrip.c 18 Jun 2018 09:15:05 - 1.166 > > +++ kern/kern_descrip.c 18 Jun 2018 09:16:20 - > > @@ -66,7 +66,11 @@ > > > > /* > > * Descriptor management. > > + * > > + * We need to block interrupts as long as `fhdlk' is being taken > > + * with and without the KERNEL_LOCK(). > > */ > > +struct mutex fhdlk = MUTEX_INITIALIZER(IPL_MPFLOOR); > > struct filelist filehead; /* head of list of open files */ > > int numfiles; /* actual number of open files */ > > > > @@ -195,16 +199,18 @@ fd_iterfile(struct file *fp, struct proc > > { > > struct file *nfp; > > > > + mtx_enter(); > > if (fp == NULL) > > nfp = LIST_FIRST(); > > else > > nfp = LIST_NEXT(fp, f_list); > > > > - /* don't FREF when f_count == 0 to avoid race in fdrop() */ > > + /* don't refcount when f_count == 0 to avoid race in fdrop() */ > > while (nfp != NULL && nfp->f_count == 0) > > nfp = LIST_NEXT(nfp, f_list); > > if (nfp != NULL) > > - FREF(nfp); > > + nfp->f_count++; > > + mtx_leave(); > > > > if (fp != NULL) > > FRELE(fp, p); > > @@ -217,10 +223,17 @@ fd_getfile(struct filedesc *fdp, int fd) > > { > > struct file *fp; > > > > - if ((u_int)fd >= fdp->fd_nfiles || (fp = fdp->fd_ofiles[fd]) == NULL) > > + vfs_stall_barrier(); > > + > > + if ((u_int)fd >= fdp->fd_nfiles) > > return (NULL); > > > > - FREF(fp); > > + mtx_enter(); > > + fp = fdp->fd_ofiles[fd]; > > + if (fp != NULL) > > + fp->f_count++; > > + mtx_leave(); > > + > > return (fp); > > } > > > > @@ -671,6 +684,8 @@ fdinsert(struct filedesc *fdp, int fd, i > > struct file *fq; > > > > fdpassertlocked(fdp); >
[patch] Add omitting period for ktrace(2) manual
Hi tech@, To be consistent with other trace points, this patch adds omitting period for KTRFAC_STRUCT, thanks! Index: ktrace.2 === RCS file: /cvs/src/lib/libc/sys/ktrace.2,v retrieving revision 1.35 diff -u -p -r1.35 ktrace.2 --- ktrace.228 Nov 2017 06:03:41 - 1.35 +++ ktrace.219 Jun 2018 08:10:38 - @@ -108,7 +108,7 @@ Trace all I/O .It Dv KTRFAC_PSIG Trace posted signals. .It Dv KTRFAC_STRUCT -Trace various structs +Trace various structs. .It Dv KTRFAC_USER Trace user data coming from .Xr utrace 2 -- Best Regards Nan Xiao