Re: Do you have an AMD Ryzen or Epyc CPU?

2018-06-19 Thread Leo Unglaub

Hi,
I applied your code on my AMD Ryzen 7 1700X. Below is the dmesg. I hope 
this helps, if you have any other AMD Ryzen related stuff that needs 
testing please let me know.


Greetings
Leo



OpenBSD 6.3-current (GENERIC.MP) #2: Wed Jun 20 05:42:54 CEST 2018

batman@batdesk.wayne-enterprises.company:/usr/src/sys/arch/amd64/compile/GENERIC.MP
real mem = 17106407424 (16313MB)
avail mem = 16447160320 (15685MB)
mpath0 at root
scsibus0 at mpath0: 256 targets
mainbus0 at root
bios0 at mainbus0: SMBIOS rev. 3.0 @ 0xea850 (59 entries)
bios0: vendor American Megatrends Inc. version "1.90" date 09/20/2017
bios0: Micro-Star International Co., Ltd. MS-7A32
acpi0 at bios0: rev 2
acpi0: sleep states S0 S3 S4 S5
acpi0: tables DSDT FACP APIC FPDT FIDT SSDT SSDT CRAT CDIT SSDT MCFG HPET SSDT 
UEFI IVRS SSDT SSDT
acpi0: wakeup devices GPP0(S4) GPP1(S4) GPP2(S4) PTXH(S4) OBLW(S4) GPP3(S4) 
GPP4(S4) GPP5(S4) GPP6(S4) GPP7(S4) GPP8(S4) GPP9(S4) GPPA(S4) GPPB(S4) 
GPPC(S4) GPPD(S4) [...]
acpitimer0 at acpi0: 3579545 Hz, 32 bits
acpimadt0 at acpi0 addr 0xfee0: PC-AT compat
cpu0 at mainbus0: apid 0 (boot processor)
cpu0: AMD Ryzen 7 1700X Eight-Core Processor, 3400.51 MHz
cpu0: 
FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,MMX,FXSR,SSE,SSE2,HTT,SSE3,PCLMUL,MWAIT,SSSE3,FMA3,CX16,SSE4.1,SSE4.2,MOVBE,POPCNT,AES,XSAVE,AVX,F16C,RDRAND,NXE,MMXX,FFXSR,PAGE1GB,RDTSCP,LONG,LAHF,CMPLEG,SVM,EAPICSP,AMCR8,ABM,SSE4A,MASSE,3DNOWP,OSVW,SKINIT,TCE,TOPEXT,CPCTR,DBKP,PCTRL3,MWAITX,ITSC,FSGSBASE,BMI1,AVX2,SMEP,BMI2,RDSEED,ADX,SMAP,CLFLUSHOPT,SHA,XSAVEOPT,XSAVEC,XGETBV1,XSAVES
cpu0: 64KB 64b/line 4-way I-cache, 32KB 64b/line 8-way D-cache, 512KB 64b/line 
8-way L2 cache, 16MB 64b/line 16-way L3 cache
cpu0: ITLB 64 4KB entries fully associative, 64 4MB entries fully associative
cpu0: DTLB 64 4KB entries fully associative, 64 4MB entries fully associative
cpu0: eax 0x ebx 0x0100 ecx 0x
cpu0: smt 0, core 0, package 0
mtrr: Pentium Pro MTRR support, 8 var ranges, 88 fixed ranges
cpu0: apic clock running at 99MHz
cpu0: mwait min=64, max=64, IBE
cpu1 at mainbus0: apid 1 (application processor)
cpu1: AMD Ryzen 7 1700X Eight-Core Processor, 3399.99 MHz
cpu1: 
FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,MMX,FXSR,SSE,SSE2,HTT,SSE3,PCLMUL,MWAIT,SSSE3,FMA3,CX16,SSE4.1,SSE4.2,MOVBE,POPCNT,AES,XSAVE,AVX,F16C,RDRAND,NXE,MMXX,FFXSR,PAGE1GB,RDTSCP,LONG,LAHF,CMPLEG,SVM,EAPICSP,AMCR8,ABM,SSE4A,MASSE,3DNOWP,OSVW,SKINIT,TCE,TOPEXT,CPCTR,DBKP,PCTRL3,MWAITX,ITSC,FSGSBASE,BMI1,AVX2,SMEP,BMI2,RDSEED,ADX,SMAP,CLFLUSHOPT,SHA,XSAVEOPT,XSAVEC,XGETBV1,XSAVES
cpu1: 64KB 64b/line 4-way I-cache, 32KB 64b/line 8-way D-cache, 512KB 64b/line 
8-way L2 cache, 16MB 64b/line 16-way L3 cache
cpu1: ITLB 64 4KB entries fully associative, 64 4MB entries fully associative
cpu1: DTLB 64 4KB entries fully associative, 64 4MB entries fully associative
cpu1: eax 0x0001 ebx 0x0100 ecx 0x
cpu1: smt 0, core 1, package 0
cpu2 at mainbus0: apid 2 (application processor)
cpu2: AMD Ryzen 7 1700X Eight-Core Processor, 3399.99 MHz
cpu2: 
FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,MMX,FXSR,SSE,SSE2,HTT,SSE3,PCLMUL,MWAIT,SSSE3,FMA3,CX16,SSE4.1,SSE4.2,MOVBE,POPCNT,AES,XSAVE,AVX,F16C,RDRAND,NXE,MMXX,FFXSR,PAGE1GB,RDTSCP,LONG,LAHF,CMPLEG,SVM,EAPICSP,AMCR8,ABM,SSE4A,MASSE,3DNOWP,OSVW,SKINIT,TCE,TOPEXT,CPCTR,DBKP,PCTRL3,MWAITX,ITSC,FSGSBASE,BMI1,AVX2,SMEP,BMI2,RDSEED,ADX,SMAP,CLFLUSHOPT,SHA,XSAVEOPT,XSAVEC,XGETBV1,XSAVES
cpu2: 64KB 64b/line 4-way I-cache, 32KB 64b/line 8-way D-cache, 512KB 64b/line 
8-way L2 cache, 16MB 64b/line 16-way L3 cache
cpu2: ITLB 64 4KB entries fully associative, 64 4MB entries fully associative
cpu2: DTLB 64 4KB entries fully associative, 64 4MB entries fully associative
cpu2: eax 0x0002 ebx 0x0101 ecx 0x
cpu2: smt 0, core 2, package 0
cpu3 at mainbus0: apid 3 (application processor)
cpu3: AMD Ryzen 7 1700X Eight-Core Processor, 3399.99 MHz
cpu3: 
FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,MMX,FXSR,SSE,SSE2,HTT,SSE3,PCLMUL,MWAIT,SSSE3,FMA3,CX16,SSE4.1,SSE4.2,MOVBE,POPCNT,AES,XSAVE,AVX,F16C,RDRAND,NXE,MMXX,FFXSR,PAGE1GB,RDTSCP,LONG,LAHF,CMPLEG,SVM,EAPICSP,AMCR8,ABM,SSE4A,MASSE,3DNOWP,OSVW,SKINIT,TCE,TOPEXT,CPCTR,DBKP,PCTRL3,MWAITX,ITSC,FSGSBASE,BMI1,AVX2,SMEP,BMI2,RDSEED,ADX,SMAP,CLFLUSHOPT,SHA,XSAVEOPT,XSAVEC,XGETBV1,XSAVES
cpu3: 64KB 64b/line 4-way I-cache, 32KB 64b/line 8-way D-cache, 512KB 64b/line 
8-way L2 cache, 16MB 64b/line 16-way L3 cache
cpu3: ITLB 64 4KB entries fully associative, 64 4MB entries fully associative
cpu3: DTLB 64 4KB entries fully associative, 64 4MB entries fully associative
cpu3: eax 0x0003 ebx 0x0101 ecx 0x
cpu3: smt 0, core 3, package 0
cpu4 at mainbus0: apid 4 (application processor)
cpu4: AMD Ryzen 7 1700X Eight-Core Processor, 3399.99 MHz
cpu4: 

iked support for IKEv2 Message Fragmentation (RFC 7383)

2018-06-19 Thread Tim Stewart
Hello tech@,

My IKEv2 sessions are occasionally down due to transit networks dropping
UDP fragments for one reason or another[1].  It happens frequently
enough that I am considering implementing support for RFC 7383 in iked.

Before I dig in, I feel that I should ask if anyone has already started
on such work.  If not, perhaps someone that is familiar with the code
could suggest an approach at a high level?

Thanks for any advice,

-TimS


[1] Whenver I've asked, the reason is usually something about DDoS
prevention.

--
Tim Stewart
---
Mail:   t...@stoo.org
Matrix: @tim:stoo.org



Re: Do you have an AMD Ryzen or Epyc CPU?

2018-06-19 Thread Durial EB
On Tue, Jun 19, 2018 at 4:32 PM Mark Kettenis 
wrote:

> Looking for people that can run a kernel with the patch below and mail
> me the resulting dmesg.
>
> Thanks,
>
> Mark
>
>
> Index: arch/amd64/amd64/identcpu.c
> ===
> RCS file: /cvs/src/sys/arch/amd64/amd64/identcpu.c,v
> retrieving revision 1.96
> diff -u -p -r1.96 identcpu.c
> --- arch/amd64/amd64/identcpu.c 7 Jun 2018 04:07:28 -   1.96
> +++ arch/amd64/amd64/identcpu.c 19 Jun 2018 20:27:24 -
> @@ -803,6 +803,11 @@ cpu_topology(struct cpu_info *ci)
> /* Pkg id is the upper remaining bits */
> ci->ci_pkg_id = apicid & ~core_mask;
> ci->ci_pkg_id >>= core_bits;
> +   if (ci->ci_pnfeatset >= 0x801e) {
> +   CPUID(0x801e, eax, ebx, ecx, edx);
> +   printf("cpu%d: eax 0x%08x ebx 0x%08x ecx 0x%08x\n",
> +   ci->ci_cpuid, eax, ebx, ecx);
> +   }
> } else if (strcmp(cpu_vendor, "GenuineIntel") == 0) {
> /* We only support leaf 1/4 detection */
> if (cpuid_level < 4)


Hi Mark,

My rig is running a Ryzen 7 1800x would that suffice for a test?

Ethan

>
>
>


Re: mandoc potential memory leak fix

2018-06-19 Thread Ingo Schwarze
Hi,

Florian Obser wrote on Mon, Jun 18, 2018 at 06:54:49PM +0200:
> On Mon, Jun 18, 2018 at 04:37:32PM +0200, Jan Schreiber wrote:

>> this patch closes potential memory leaks in the mandoc memory
>> wrapper functions and follows the examples in the manpages.

> These are not leaks since mandoc exits via err(3) immediately after an
> allocation failure. Which is the whole point of these wrappers I
> imagine.

Exactly, so please don't commit these changes, they make the code
less readable for no benefit.

Where these functions are used, the calling code typically already
holds hundreds of pointers to allocated memory for the syntax tree
that is in the process of being built.  These functions save us from
having to call functions that recursively free the syntax tree 
from each and every place where memory allocation can fail.

Yours,
  Ingo



Do you have an AMD Ryzen or Epyc CPU?

2018-06-19 Thread Mark Kettenis
Looking for people that can run a kernel with the patch below and mail
me the resulting dmesg.

Thanks,

Mark


Index: arch/amd64/amd64/identcpu.c
===
RCS file: /cvs/src/sys/arch/amd64/amd64/identcpu.c,v
retrieving revision 1.96
diff -u -p -r1.96 identcpu.c
--- arch/amd64/amd64/identcpu.c 7 Jun 2018 04:07:28 -   1.96
+++ arch/amd64/amd64/identcpu.c 19 Jun 2018 20:27:24 -
@@ -803,6 +803,11 @@ cpu_topology(struct cpu_info *ci)
/* Pkg id is the upper remaining bits */
ci->ci_pkg_id = apicid & ~core_mask;
ci->ci_pkg_id >>= core_bits;
+   if (ci->ci_pnfeatset >= 0x801e) {
+   CPUID(0x801e, eax, ebx, ecx, edx);
+   printf("cpu%d: eax 0x%08x ebx 0x%08x ecx 0x%08x\n",
+   ci->ci_cpuid, eax, ebx, ecx);
+   }
} else if (strcmp(cpu_vendor, "GenuineIntel") == 0) {
/* We only support leaf 1/4 detection */
if (cpuid_level < 4)



Re: ospfd: deal with /etc/netstart, changes of netmask and dest_addr

2018-06-19 Thread Remi Locherer
On Tue, Jun 19, 2018 at 03:59:24PM +0100, Stuart Henderson wrote:
> On 2018/06/18 08:53, Remi Locherer wrote:
> > Index: ospfd.h
> > ===
> > RCS file: /cvs/src/usr.sbin/ospfd/ospfd.h,v
> > retrieving revision 1.100
> > diff -u -p -r1.100 ospfd.h
> > --- ospfd.h 11 Feb 2018 02:27:33 -  1.100
> > +++ ospfd.h 12 Jun 2018 20:41:43 -
> ...
> > -struct ifaddrdel {
> > +struct ifaddr {
> > struct in_addr  addr;
> > +   struct in_addr  mask;
> > +   struct in_addr  dst;
> > unsigned intifindex;
> >  };
> 
> I think it would be better to use a different name.
> 
> Even if there's no actual conflict as if_var.h isn't pulled in, this is
> renaming this struct to "struct ifaddr" is at least confusing.

Oh yes, that is unfortunate.

What would be a better name for this struct? 
iface_addr maybe? Or if_addrmask?



Re: On the matter of OpenBSD breaking embargos (KRACK)

2018-06-19 Thread Kevin Chadwick
On Tue, 19 Jun 2018 09:30:05 -0600


> >   
> > > ps. Disable Intel Hyper-Threading where not needed, until we all
> > > know more.  
> > 
> > Is it safer to use bsd.sp for the time being?  
> 
> No, a better solution is coming.   And in snapshots already.

Thankyou for enabling us to patch ASAP.



Re: [patch] Add omitting period for ktrace(2) manual

2018-06-19 Thread Jason McIntyre
On Tue, Jun 19, 2018 at 04:22:26PM +0800, Nan Xiao wrote:
> Hi tech@,
> 
> To be consistent with other trace points, this patch adds omitting
> period for KTRFAC_STRUCT, thanks!
> 

fixed, thanks.
jmc

> Index: ktrace.2
> ===
> RCS file: /cvs/src/lib/libc/sys/ktrace.2,v
> retrieving revision 1.35
> diff -u -p -r1.35 ktrace.2
> --- ktrace.2  28 Nov 2017 06:03:41 -  1.35
> +++ ktrace.2  19 Jun 2018 08:10:38 -
> @@ -108,7 +108,7 @@ Trace all I/O
>  .It Dv KTRFAC_PSIG
>  Trace posted signals.
>  .It Dv KTRFAC_STRUCT
> -Trace various structs
> +Trace various structs.
>  .It Dv KTRFAC_USER
>  Trace user data coming from
>  .Xr utrace 2
> 
> -- 
> Best Regards
> Nan Xiao
> 



Re: Unlock sendit-based syscalls

2018-06-19 Thread Visa Hankala
On Tue, Jun 19, 2018 at 03:58:51PM +0200, Mark Kettenis wrote:
> > Date: Tue, 19 Jun 2018 15:38:01 +0200
> > From: Martin Pieuchot 
> > 
> > On 19/06/18(Tue) 14:55, Mark Kettenis wrote:
> > > > To avoid races with another thread that might be clearing our pointer
> > > > in `fd_ofiles', we need more than atomic operations.  For that we need
> > > > to serialize the threads.  The most simple way to do so is with a mutex
> > > > on a different data structure.  Either global, like in my diff, or as
> > > > visa@ suggested in the corresponding `fdp'.
> > > 
> > > Right.  Another case of trying to take a reference without holding one
> > > already.  The trick here is to use the fact that as long as there is a
> > > file descriptor allocated for this open file the reference count is at
> > > least 1.  So in that case taking a reference is safe.  Your global
> > > mutex does indeed do the trick.  But why aren't you using the file
> > > descriptor table rwlock for this purpose?  Is that because there is a
> > > lock ordering problem between the kernel lock and that rwlock?
> > 
> > I have two reasons.  First I don't want to introduce new sleeping points,
> > secondly I want this mutex to disappear.  IMHO extending the scope of a
> > lock is going in the wrong direction because then we'll want to split it.
> > That's why I'm happy that our discussion made visa@ look at improving this.
> 
> I wouldn't call this extending the scope of the lock.  But yes, it
> might cause unnecessary sleeps if a write lock is held for the purpose
> of opening a file.  The mutex that visa@ proposes trades that in for
> potential contion in the multiple-readers case.

Below is a revised version of the diff. Like mpi@'s diff, it uses
a `fhdlk' mutex for serializing access to the file list. However, that
lock is not used for anything else. Each file descriptor table has a
dedicated mutex that allows fd_getfile() acquire a file reference
safely.

fd_iterfile() uses a compare-and-swap sequence for reference
acquisition. The sequence ensures that once `f_count' has become zero
the value is never raised.

This diff does not unlock any system calls.


Index: kern/kern_descrip.c
===
RCS file: src/sys/kern/kern_descrip.c,v
retrieving revision 1.166
diff -u -p -r1.166 kern_descrip.c
--- kern/kern_descrip.c 18 Jun 2018 09:15:05 -  1.166
+++ kern/kern_descrip.c 19 Jun 2018 15:13:05 -
@@ -66,7 +66,11 @@
 
 /*
  * Descriptor management.
+ *
+ * We need to block interrupts as long as `fhdlk' is being taken
+ * with and without the KERNEL_LOCK().
  */
+struct mutex fhdlk = MUTEX_INITIALIZER(IPL_MPFLOOR);
 struct filelist filehead;  /* head of list of open files */
 int numfiles;  /* actual number of open files */
 
@@ -194,17 +198,25 @@ struct file *
 fd_iterfile(struct file *fp, struct proc *p)
 {
struct file *nfp;
+   unsigned int count;
 
+   mtx_enter();
if (fp == NULL)
nfp = LIST_FIRST();
else
nfp = LIST_NEXT(fp, f_list);
 
/* don't FREF when f_count == 0 to avoid race in fdrop() */
-   while (nfp != NULL && nfp->f_count == 0)
-   nfp = LIST_NEXT(nfp, f_list);
-   if (nfp != NULL)
-   FREF(nfp);
+   while (nfp != NULL) {
+   count = nfp->f_count;
+   if (count == 0) {
+   nfp = LIST_NEXT(nfp, f_list);
+   continue;
+   }
+   if (atomic_cas_uint(>f_count, count, count + 1) == count)
+   break;
+   }
+   mtx_leave();
 
if (fp != NULL)
FRELE(fp, p);
@@ -217,10 +229,17 @@ fd_getfile(struct filedesc *fdp, int fd)
 {
struct file *fp;
 
-   if ((u_int)fd >= fdp->fd_nfiles || (fp = fdp->fd_ofiles[fd]) == NULL)
+   vfs_stall_barrier();
+
+   if ((u_int)fd >= fdp->fd_nfiles)
return (NULL);
 
-   FREF(fp);
+   mtx_enter(>fd_fplock);
+   fp = fdp->fd_ofiles[fd];
+   if (fp != NULL)
+   atomic_inc_int(>f_count);
+   mtx_leave(>fd_fplock);
+
return (fp);
 }
 
@@ -639,7 +658,7 @@ finishdup(struct proc *p, struct file *f
fdpassertlocked(fdp);
KASSERT(fp->f_iflags & FIF_INSERTED);
 
-   if (fp->f_count == LONG_MAX-2) {
+   if (fp->f_count >= FDUP_MAX_COUNT) {
FRELE(fp, p);
return (EDEADLK);
}
@@ -653,8 +672,10 @@ finishdup(struct proc *p, struct file *f
fd_used(fdp, new);
}
 
+   mtx_enter(>fd_fplock);
fdp->fd_ofiles[new] = fp;
fdp->fd_ofileflags[new] = fdp->fd_ofileflags[old] & ~UF_EXCLOSE;
+   mtx_leave(>fd_fplock);
*retval = new;
 
if (oldfp != NULL) {
@@ -671,22 +692,34 @@ fdinsert(struct filedesc *fdp, int fd, i
struct file *fq;
 
fdpassertlocked(fdp);
+
+   mtx_enter();
+   fp->f_iflags |= 

Re: On the matter of OpenBSD breaking embargos (KRACK)

2018-06-19 Thread Theo de Raadt
Kevin Chadwick  wrote:

> On Fri, 15 Jun 2018 17:28:14 -0600
> 
> 
> > ps. Disable Intel Hyper-Threading where not needed, until we all know
> > more.
> 
> Is it safer to use bsd.sp for the time being?

No, a better solution is coming.   And in snapshots already.



Re: On the matter of OpenBSD breaking embargos (KRACK)

2018-06-19 Thread Kevin Chadwick
On Fri, 15 Jun 2018 17:28:14 -0600


> ps. Disable Intel Hyper-Threading where not needed, until we all know
> more.

Is it safer to use bsd.sp for the time being?



Re: ospfd: deal with /etc/netstart, changes of netmask and dest_addr

2018-06-19 Thread Stuart Henderson
On 2018/06/18 08:53, Remi Locherer wrote:
> Index: ospfd.h
> ===
> RCS file: /cvs/src/usr.sbin/ospfd/ospfd.h,v
> retrieving revision 1.100
> diff -u -p -r1.100 ospfd.h
> --- ospfd.h   11 Feb 2018 02:27:33 -  1.100
> +++ ospfd.h   12 Jun 2018 20:41:43 -
...
> -struct ifaddrdel {
> +struct ifaddr {
>   struct in_addr  addr;
> + struct in_addr  mask;
> + struct in_addr  dst;
>   unsigned intifindex;
>  };

I think it would be better to use a different name.

Even if there's no actual conflict as if_var.h isn't pulled in, this is
renaming this struct to "struct ifaddr" is at least confusing.



Re: Prevent races w/ f_data

2018-06-19 Thread Mark Kettenis
> Date: Tue, 19 Jun 2018 15:45:48 +0200
> From: Martin Pieuchot 
> 
> On 19/06/18(Tue) 15:08, Mark Kettenis wrote:
> > > Date: Tue, 19 Jun 2018 12:51:35 +0200
> > > From: Martin Pieuchot 
> > > 
> > > There's one place in our kernel where `f_data' is overwritten while the
> > > descriptor sits in multiple shared data structures.  It is in diskmap.
> > > 
> > > We want to avoid this situation to be able to treat f_data as immutable.
> > > 
> > > So the diff below remove `fp' from the shared data structures before it
> > > gets modified.  It is not a complete solution to the reference grabbing
> > > problem of vnode.  There's still a window where a race is possible
> > > between the moment where we release the mutex and increase the vnode's
> > > reference.  The KERNEL_LOCK() is what protects us for now.  However this
> > > problem is the next one to solve to be able to unlock syscalls messing
> > > with vnodes.
> > > 
> > > Still this diff is a step towards that.
> > > 
> > > Note that this change only apply to vnodes, so it doesn't matter for
> > > socket-related syscalls.
> > > 
> > > Ok?
> > 
> > I think this is the wrong approach.  Instead of modifying the struct
> > file, this code should create a new struct file.  Then replace the
> > pointer in the file descriptor table with that new file and then drop
> > the reference to the old struct file.  That way f_data *is* immutable.
> 
> I tried that first... well it didn't work for me.
> 
> The problem is that once you swapped `fp' you still need to free the old
> one.  But you don't want to close the associated file, so you end up
> NULLing `f_data' before calling closef()...

You simply call FRELE() on the old struct file.  That should properly
close the file once the last reference goes.  Obviously you'd remove
the code that closes the original vnode from diskmapioctl().  I don't
see how closef() would be called in that scenario, since you're not
releasing the file descriptor.

> On top of that this approach introduces a new error condition if we
> reach `maxfiles', yes this is a degenerated case but I'd prefer not
> have to deal with it.

True.  And it changes the semantics of the DIOCMAP ioctl.  Currently,
if you call DIOCMAP on a dup'ed file descriptor it would change the
open file associated with all of them.  With my suggestion, it will
only change the open file for the file descriptor you're operating on.
The other file descriptor would continue to refer to the original open
file (typically the diskmap device, but not necessarily).  I'd argue
that is actually how this ioctl should behave.

In fact the current behaviour is a bit of a security risk.  I could
open a file, pass the descriptor to another process and then call
DIOCMAP and suddenly the other process has a ilfe descriptor that
refers to a disk device instead of a normal file.

> The approach below returns `fp' into the "larval" or "reserved" state,
> change it, then insert it again.
> 
> > 
> > > Index: dev/diskmap.c
> > > ===
> > > RCS file: /cvs/src/sys/dev/diskmap.c,v
> > > retrieving revision 1.20
> > > diff -u -p -r1.20 diskmap.c
> > > --- dev/diskmap.c 9 May 2018 08:42:02 -   1.20
> > > +++ dev/diskmap.c 19 Jun 2018 10:38:37 -
> > > @@ -59,7 +59,7 @@ diskmapioctl(dev_t dev, u_long cmd, cadd
> > >   struct filedesc *fdp = p->p_fd;
> > >   struct file *fp = NULL;
> > >   struct vnode *vp = NULL, *ovp;
> > > - char *devname;
> > > + char *devname, flags;
> > >   int fd, error = EINVAL;
> > >  
> > >   if (cmd != DIOCMAP)
> > > @@ -106,6 +106,15 @@ diskmapioctl(dev_t dev, u_long cmd, cadd
> > >   vput(ovp);
> > >   }
> > >  
> > > + /* Remove the file while we're modifying it to prevent races. */
> > > + flags = fdp->fd_ofileflags[fd];
> > > + mtx_enter();
> > > + fdp->fd_ofiles[fd] = NULL;
> > > + fdp->fd_ofileflags[fd] = 0;
> > > + if (fp->f_iflags & FIF_INSERTED)
> > > + LIST_REMOVE(fp, f_list);
> > > + mtx_leave();
> > > +
> > >   fp->f_data = (caddr_t)vp;
> > >   fp->f_offset = 0;
> > >   mtx_enter(>f_mtx);
> > > @@ -115,6 +124,9 @@ diskmapioctl(dev_t dev, u_long cmd, cadd
> > >   fp->f_rbytes = 0;
> > >   fp->f_wbytes = 0;
> > >   mtx_leave(>f_mtx);
> > > +
> > > + /* Insert it back where it was. */
> > > + fdinsert(fdp, fd, flags, fp);
> > >  
> > >   VOP_UNLOCK(vp);
> > >  
> > > 
> > > 
> 
> 



Re: Unlock sendit-based syscalls

2018-06-19 Thread Mark Kettenis
> Date: Tue, 19 Jun 2018 15:38:01 +0200
> From: Martin Pieuchot 
> 
> On 19/06/18(Tue) 14:55, Mark Kettenis wrote:
> > > To avoid races with another thread that might be clearing our pointer
> > > in `fd_ofiles', we need more than atomic operations.  For that we need
> > > to serialize the threads.  The most simple way to do so is with a mutex
> > > on a different data structure.  Either global, like in my diff, or as
> > > visa@ suggested in the corresponding `fdp'.
> > 
> > Right.  Another case of trying to take a reference without holding one
> > already.  The trick here is to use the fact that as long as there is a
> > file descriptor allocated for this open file the reference count is at
> > least 1.  So in that case taking a reference is safe.  Your global
> > mutex does indeed do the trick.  But why aren't you using the file
> > descriptor table rwlock for this purpose?  Is that because there is a
> > lock ordering problem between the kernel lock and that rwlock?
> 
> I have two reasons.  First I don't want to introduce new sleeping points,
> secondly I want this mutex to disappear.  IMHO extending the scope of a
> lock is going in the wrong direction because then we'll want to split it.
> That's why I'm happy that our discussion made visa@ look at improving this.

I wouldn't call this extending the scope of the lock.  But yes, it
might cause unnecessary sleeps if a write lock is held for the purpose
of opening a file.  The mutex that visa@ proposes trades that in for
potential contion in the multiple-readers case.



Re: Prevent races w/ f_data

2018-06-19 Thread Martin Pieuchot
On 19/06/18(Tue) 15:08, Mark Kettenis wrote:
> > Date: Tue, 19 Jun 2018 12:51:35 +0200
> > From: Martin Pieuchot 
> > 
> > There's one place in our kernel where `f_data' is overwritten while the
> > descriptor sits in multiple shared data structures.  It is in diskmap.
> > 
> > We want to avoid this situation to be able to treat f_data as immutable.
> > 
> > So the diff below remove `fp' from the shared data structures before it
> > gets modified.  It is not a complete solution to the reference grabbing
> > problem of vnode.  There's still a window where a race is possible
> > between the moment where we release the mutex and increase the vnode's
> > reference.  The KERNEL_LOCK() is what protects us for now.  However this
> > problem is the next one to solve to be able to unlock syscalls messing
> > with vnodes.
> > 
> > Still this diff is a step towards that.
> > 
> > Note that this change only apply to vnodes, so it doesn't matter for
> > socket-related syscalls.
> > 
> > Ok?
> 
> I think this is the wrong approach.  Instead of modifying the struct
> file, this code should create a new struct file.  Then replace the
> pointer in the file descriptor table with that new file and then drop
> the reference to the old struct file.  That way f_data *is* immutable.

I tried that first... well it didn't work for me.

The problem is that once you swapped `fp' you still need to free the old
one.  But you don't want to close the associated file, so you end up
NULLing `f_data' before calling closef()...

On top of that this approach introduces a new error condition if we
reach `maxfiles', yes this is a degenerated case but I'd prefer not
have to deal with it.

The approach below returns `fp' into the "larval" or "reserved" state,
change it, then insert it again.

> 
> > Index: dev/diskmap.c
> > ===
> > RCS file: /cvs/src/sys/dev/diskmap.c,v
> > retrieving revision 1.20
> > diff -u -p -r1.20 diskmap.c
> > --- dev/diskmap.c   9 May 2018 08:42:02 -   1.20
> > +++ dev/diskmap.c   19 Jun 2018 10:38:37 -
> > @@ -59,7 +59,7 @@ diskmapioctl(dev_t dev, u_long cmd, cadd
> > struct filedesc *fdp = p->p_fd;
> > struct file *fp = NULL;
> > struct vnode *vp = NULL, *ovp;
> > -   char *devname;
> > +   char *devname, flags;
> > int fd, error = EINVAL;
> >  
> > if (cmd != DIOCMAP)
> > @@ -106,6 +106,15 @@ diskmapioctl(dev_t dev, u_long cmd, cadd
> > vput(ovp);
> > }
> >  
> > +   /* Remove the file while we're modifying it to prevent races. */
> > +   flags = fdp->fd_ofileflags[fd];
> > +   mtx_enter();
> > +   fdp->fd_ofiles[fd] = NULL;
> > +   fdp->fd_ofileflags[fd] = 0;
> > +   if (fp->f_iflags & FIF_INSERTED)
> > +   LIST_REMOVE(fp, f_list);
> > +   mtx_leave();
> > +
> > fp->f_data = (caddr_t)vp;
> > fp->f_offset = 0;
> > mtx_enter(>f_mtx);
> > @@ -115,6 +124,9 @@ diskmapioctl(dev_t dev, u_long cmd, cadd
> > fp->f_rbytes = 0;
> > fp->f_wbytes = 0;
> > mtx_leave(>f_mtx);
> > +
> > +   /* Insert it back where it was. */
> > +   fdinsert(fdp, fd, flags, fp);
> >  
> > VOP_UNLOCK(vp);
> >  
> > 
> > 



Re: Unlock sendit-based syscalls

2018-06-19 Thread Martin Pieuchot
On 19/06/18(Tue) 14:55, Mark Kettenis wrote:
> > To avoid races with another thread that might be clearing our pointer
> > in `fd_ofiles', we need more than atomic operations.  For that we need
> > to serialize the threads.  The most simple way to do so is with a mutex
> > on a different data structure.  Either global, like in my diff, or as
> > visa@ suggested in the corresponding `fdp'.
> 
> Right.  Another case of trying to take a reference without holding one
> already.  The trick here is to use the fact that as long as there is a
> file descriptor allocated for this open file the reference count is at
> least 1.  So in that case taking a reference is safe.  Your global
> mutex does indeed do the trick.  But why aren't you using the file
> descriptor table rwlock for this purpose?  Is that because there is a
> lock ordering problem between the kernel lock and that rwlock?

I have two reasons.  First I don't want to introduce new sleeping points,
secondly I want this mutex to disappear.  IMHO extending the scope of a
lock is going in the wrong direction because then we'll want to split it.
That's why I'm happy that our discussion made visa@ look at improving this.



Re: Prevent races w/ f_data

2018-06-19 Thread Mark Kettenis
> Date: Tue, 19 Jun 2018 12:51:35 +0200
> From: Martin Pieuchot 
> 
> There's one place in our kernel where `f_data' is overwritten while the
> descriptor sits in multiple shared data structures.  It is in diskmap.
> 
> We want to avoid this situation to be able to treat f_data as immutable.
> 
> So the diff below remove `fp' from the shared data structures before it
> gets modified.  It is not a complete solution to the reference grabbing
> problem of vnode.  There's still a window where a race is possible
> between the moment where we release the mutex and increase the vnode's
> reference.  The KERNEL_LOCK() is what protects us for now.  However this
> problem is the next one to solve to be able to unlock syscalls messing
> with vnodes.
> 
> Still this diff is a step towards that.
> 
> Note that this change only apply to vnodes, so it doesn't matter for
> socket-related syscalls.
> 
> Ok?

I think this is the wrong approach.  Instead of modifying the struct
file, this code should create a new struct file.  Then replace the
pointer in the file descriptor table with that new file and then drop
the reference to the old struct file.  That way f_data *is* immutable.

> Index: dev/diskmap.c
> ===
> RCS file: /cvs/src/sys/dev/diskmap.c,v
> retrieving revision 1.20
> diff -u -p -r1.20 diskmap.c
> --- dev/diskmap.c 9 May 2018 08:42:02 -   1.20
> +++ dev/diskmap.c 19 Jun 2018 10:38:37 -
> @@ -59,7 +59,7 @@ diskmapioctl(dev_t dev, u_long cmd, cadd
>   struct filedesc *fdp = p->p_fd;
>   struct file *fp = NULL;
>   struct vnode *vp = NULL, *ovp;
> - char *devname;
> + char *devname, flags;
>   int fd, error = EINVAL;
>  
>   if (cmd != DIOCMAP)
> @@ -106,6 +106,15 @@ diskmapioctl(dev_t dev, u_long cmd, cadd
>   vput(ovp);
>   }
>  
> + /* Remove the file while we're modifying it to prevent races. */
> + flags = fdp->fd_ofileflags[fd];
> + mtx_enter();
> + fdp->fd_ofiles[fd] = NULL;
> + fdp->fd_ofileflags[fd] = 0;
> + if (fp->f_iflags & FIF_INSERTED)
> + LIST_REMOVE(fp, f_list);
> + mtx_leave();
> +
>   fp->f_data = (caddr_t)vp;
>   fp->f_offset = 0;
>   mtx_enter(>f_mtx);
> @@ -115,6 +124,9 @@ diskmapioctl(dev_t dev, u_long cmd, cadd
>   fp->f_rbytes = 0;
>   fp->f_wbytes = 0;
>   mtx_leave(>f_mtx);
> +
> + /* Insert it back where it was. */
> + fdinsert(fdp, fd, flags, fp);
>  
>   VOP_UNLOCK(vp);
>  
> 
> 



Re: Unlock sendit-based syscalls

2018-06-19 Thread Mark Kettenis
> Date: Tue, 19 Jun 2018 10:54:21 +0200
> From: Martin Pieuchot 
> 
> On 18/06/18(Mon) 18:18, Mark Kettenis wrote:
> > > Date: Mon, 18 Jun 2018 11:24:00 +0200
> > > From: Martin Pieuchot 
> > > 
> > > Diff below is the last of the serie to remove the KERNEL_LOCK() from
> > > sendto(2) and sendmsg(2) for sockets protected by the NET_LOCK().
> > > 
> > > As explained previously [0] this diff uses a simple non-intrusive
> > > approached based on a single mutex.  I'd like to get this in before
> > > doing any refactoring of this layer to get real test coverages.
> > > 
> > > [0] https://marc.info/?l=openbsd-tech=152699644924165=2
> > > 
> > > Tests?  Comments?  Oks?
> > 
> > At first sight this diff seems to mess up the locking quite a bit.
> > I'm sure that is intentional, but still I'm not sure this is the right
> > direction.  In my mind we should at some point in the future be able
> > to use atomic operations to manage the reference count on struct file
> > (f_count).  Using atomic operations has the huge benefit of not having
> > lock ordering problems!  But it looks as if this diff gets us further
> > away from that goal.
> > 
> > You'll still need a lock for the global list of course.  It is easy to
> > protect insertions and removals.  The real problem is indeed
> > fd_iterfile().  There you want to take a reference but that isn't
> > really safe since you don't necessarily hold one already.  That's why
> > we have that f_count == 0 check there.  And as long as you're holding
> > the lock that protects the list you know the struct file is not going
> > to disappear.  So if f_count > 0 you can safely take a reference.  But
> > calling FREF while holding a mutex is problematic because of
> > vfs_stall_barrier().  You're correct that vfs_stall_barrier() isn't
> > required here.  All you need to do is atomically increase the
> > reference count.  So this one is indeed a bit special.  And using a
> > mutex to protect the reference counts could still work here.
> 
> I agree with your analyse.  My plan was a bit different.  I wanted to
> properly track references of elements on the list.  That means that
> kern_sysctl() could now be freeing a struct file, which shouldn't be
> a problem.
> 
> > What I don't understand is why you also protect the manipulation of
> > the file descriptor tables with the mutex in some places.  Those
> > tables are already protected by a lock!
> 
> That's the most important part of this diff.  My goal is to ensure
> that a descriptor returned by fd_getfile() is and will stay valid
> during the whole execution of an unlocked syscall.

The file descriptor doesn't have to stay valid (i.e. visible).  Only
the struct file associated with it.  But I guess that is what you mean.

> To avoid races with another thread that might be clearing our pointer
> in `fd_ofiles', we need more than atomic operations.  For that we need
> to serialize the threads.  The most simple way to do so is with a mutex
> on a different data structure.  Either global, like in my diff, or as
> visa@ suggested in the corresponding `fdp'.

Right.  Another case of trying to take a reference without holding one
already.  The trick here is to use the fact that as long as there is a
file descriptor allocated for this open file the reference count is at
least 1.  So in that case taking a reference is safe.  Your global
mutex does indeed do the trick.  But why aren't you using the file
descriptor table rwlock for this purpose?  Is that because there is a
lock ordering problem between the kernel lock and that rwlock?

> > All in all, I think that using atomic operations for the reference
> > count would really help.
> 
> I agree, I just wanted to keep the logic easy in a first step.  It
> shouldn't be hard to split the mutex I'm using. 

The thing I'm a little bit worried about is that by inlining part of
FREF() in various places it will be harder to eventually use atomic
reference counts.  On the other hand those cases serve as a marker for
the spots where the reference counting needs a little bit more
attention.

I'd like to have an answer to my question about the rwlock.  But if
the answer is indeed that there is a rwlock, I think this diff is good
to go.

> > > Index: kern/kern_descrip.c
> > > ===
> > > RCS file: /cvs/src/sys/kern/kern_descrip.c,v
> > > retrieving revision 1.166
> > > diff -u -p -r1.166 kern_descrip.c
> > > --- kern/kern_descrip.c   18 Jun 2018 09:15:05 -  1.166
> > > +++ kern/kern_descrip.c   18 Jun 2018 09:16:20 -
> > > @@ -66,7 +66,11 @@
> > >  
> > >  /*
> > >   * Descriptor management.
> > > + *
> > > + * We need to block interrupts as long as `fhdlk' is being taken
> > > + * with and without the KERNEL_LOCK().
> > >   */
> > > +struct mutex fhdlk = MUTEX_INITIALIZER(IPL_MPFLOOR);
> > >  struct filelist filehead;/* head of list of open files */
> > >  int numfiles;/* actual number of open files 

Prevent races w/ f_data

2018-06-19 Thread Martin Pieuchot
There's one place in our kernel where `f_data' is overwritten while the
descriptor sits in multiple shared data structures.  It is in diskmap.

We want to avoid this situation to be able to treat f_data as immutable.

So the diff below remove `fp' from the shared data structures before it
gets modified.  It is not a complete solution to the reference grabbing
problem of vnode.  There's still a window where a race is possible
between the moment where we release the mutex and increase the vnode's
reference.  The KERNEL_LOCK() is what protects us for now.  However this
problem is the next one to solve to be able to unlock syscalls messing
with vnodes.

Still this diff is a step towards that.

Note that this change only apply to vnodes, so it doesn't matter for
socket-related syscalls.

Ok?

Index: dev/diskmap.c
===
RCS file: /cvs/src/sys/dev/diskmap.c,v
retrieving revision 1.20
diff -u -p -r1.20 diskmap.c
--- dev/diskmap.c   9 May 2018 08:42:02 -   1.20
+++ dev/diskmap.c   19 Jun 2018 10:38:37 -
@@ -59,7 +59,7 @@ diskmapioctl(dev_t dev, u_long cmd, cadd
struct filedesc *fdp = p->p_fd;
struct file *fp = NULL;
struct vnode *vp = NULL, *ovp;
-   char *devname;
+   char *devname, flags;
int fd, error = EINVAL;
 
if (cmd != DIOCMAP)
@@ -106,6 +106,15 @@ diskmapioctl(dev_t dev, u_long cmd, cadd
vput(ovp);
}
 
+   /* Remove the file while we're modifying it to prevent races. */
+   flags = fdp->fd_ofileflags[fd];
+   mtx_enter();
+   fdp->fd_ofiles[fd] = NULL;
+   fdp->fd_ofileflags[fd] = 0;
+   if (fp->f_iflags & FIF_INSERTED)
+   LIST_REMOVE(fp, f_list);
+   mtx_leave();
+
fp->f_data = (caddr_t)vp;
fp->f_offset = 0;
mtx_enter(>f_mtx);
@@ -115,6 +124,9 @@ diskmapioctl(dev_t dev, u_long cmd, cadd
fp->f_rbytes = 0;
fp->f_wbytes = 0;
mtx_leave(>f_mtx);
+
+   /* Insert it back where it was. */
+   fdinsert(fdp, fd, flags, fp);
 
VOP_UNLOCK(vp);
 



Re: Unlock sendit-based syscalls

2018-06-19 Thread Martin Pieuchot
On 18/06/18(Mon) 18:18, Mark Kettenis wrote:
> > Date: Mon, 18 Jun 2018 11:24:00 +0200
> > From: Martin Pieuchot 
> > 
> > Diff below is the last of the serie to remove the KERNEL_LOCK() from
> > sendto(2) and sendmsg(2) for sockets protected by the NET_LOCK().
> > 
> > As explained previously [0] this diff uses a simple non-intrusive
> > approached based on a single mutex.  I'd like to get this in before
> > doing any refactoring of this layer to get real test coverages.
> > 
> > [0] https://marc.info/?l=openbsd-tech=152699644924165=2
> > 
> > Tests?  Comments?  Oks?
> 
> At first sight this diff seems to mess up the locking quite a bit.
> I'm sure that is intentional, but still I'm not sure this is the right
> direction.  In my mind we should at some point in the future be able
> to use atomic operations to manage the reference count on struct file
> (f_count).  Using atomic operations has the huge benefit of not having
> lock ordering problems!  But it looks as if this diff gets us further
> away from that goal.
> 
> You'll still need a lock for the global list of course.  It is easy to
> protect insertions and removals.  The real problem is indeed
> fd_iterfile().  There you want to take a reference but that isn't
> really safe since you don't necessarily hold one already.  That's why
> we have that f_count == 0 check there.  And as long as you're holding
> the lock that protects the list you know the struct file is not going
> to disappear.  So if f_count > 0 you can safely take a reference.  But
> calling FREF while holding a mutex is problematic because of
> vfs_stall_barrier().  You're correct that vfs_stall_barrier() isn't
> required here.  All you need to do is atomically increase the
> reference count.  So this one is indeed a bit special.  And using a
> mutex to protect the reference counts could still work here.

I agree with your analyse.  My plan was a bit different.  I wanted to
properly track references of elements on the list.  That means that
kern_sysctl() could now be freeing a struct file, which shouldn't be
a problem.

> What I don't understand is why you also protect the manipulation of
> the file descriptor tables with the mutex in some places.  Those
> tables are already protected by a lock!

That's the most important part of this diff.  My goal is to ensure
that a descriptor returned by fd_getfile() is and will stay valid
during the whole execution of an unlocked syscall.

To avoid races with another thread that might be clearing our pointer
in `fd_ofiles', we need more than atomic operations.  For that we need
to serialize the threads.  The most simple way to do so is with a mutex
on a different data structure.  Either global, like in my diff, or as
visa@ suggested in the corresponding `fdp'.

> All in all, I think that using atomic operations for the reference
> count would really help.

I agree, I just wanted to keep the logic easy in a first step.  It
shouldn't be hard to split the mutex I'm using. 

> > Index: kern/kern_descrip.c
> > ===
> > RCS file: /cvs/src/sys/kern/kern_descrip.c,v
> > retrieving revision 1.166
> > diff -u -p -r1.166 kern_descrip.c
> > --- kern/kern_descrip.c 18 Jun 2018 09:15:05 -  1.166
> > +++ kern/kern_descrip.c 18 Jun 2018 09:16:20 -
> > @@ -66,7 +66,11 @@
> >  
> >  /*
> >   * Descriptor management.
> > + *
> > + * We need to block interrupts as long as `fhdlk' is being taken
> > + * with and without the KERNEL_LOCK().
> >   */
> > +struct mutex fhdlk = MUTEX_INITIALIZER(IPL_MPFLOOR);
> >  struct filelist filehead;  /* head of list of open files */
> >  int numfiles;  /* actual number of open files */
> >  
> > @@ -195,16 +199,18 @@ fd_iterfile(struct file *fp, struct proc
> >  {
> > struct file *nfp;
> >  
> > +   mtx_enter();
> > if (fp == NULL)
> > nfp = LIST_FIRST();
> > else
> > nfp = LIST_NEXT(fp, f_list);
> >  
> > -   /* don't FREF when f_count == 0 to avoid race in fdrop() */
> > +   /* don't refcount when f_count == 0 to avoid race in fdrop() */
> > while (nfp != NULL && nfp->f_count == 0)
> > nfp = LIST_NEXT(nfp, f_list);
> > if (nfp != NULL)
> > -   FREF(nfp);
> > +   nfp->f_count++;
> > +   mtx_leave();
> >  
> > if (fp != NULL)
> > FRELE(fp, p);
> > @@ -217,10 +223,17 @@ fd_getfile(struct filedesc *fdp, int fd)
> >  {
> > struct file *fp;
> >  
> > -   if ((u_int)fd >= fdp->fd_nfiles || (fp = fdp->fd_ofiles[fd]) == NULL)
> > +   vfs_stall_barrier();
> > +
> > +   if ((u_int)fd >= fdp->fd_nfiles)
> > return (NULL);
> >  
> > -   FREF(fp);
> > +   mtx_enter();
> > +   fp = fdp->fd_ofiles[fd];
> > +   if (fp != NULL)
> > +   fp->f_count++;
> > +   mtx_leave();
> > +
> > return (fp);
> >  }
> >  
> > @@ -671,6 +684,8 @@ fdinsert(struct filedesc *fdp, int fd, i
> > struct file *fq;
> >  
> > fdpassertlocked(fdp);
> 

[patch] Add omitting period for ktrace(2) manual

2018-06-19 Thread Nan Xiao
Hi tech@,

To be consistent with other trace points, this patch adds omitting
period for KTRFAC_STRUCT, thanks!

Index: ktrace.2
===
RCS file: /cvs/src/lib/libc/sys/ktrace.2,v
retrieving revision 1.35
diff -u -p -r1.35 ktrace.2
--- ktrace.228 Nov 2017 06:03:41 -  1.35
+++ ktrace.219 Jun 2018 08:10:38 -
@@ -108,7 +108,7 @@ Trace all I/O
 .It Dv KTRFAC_PSIG
 Trace posted signals.
 .It Dv KTRFAC_STRUCT
-Trace various structs
+Trace various structs.
 .It Dv KTRFAC_USER
 Trace user data coming from
 .Xr utrace 2

-- 
Best Regards
Nan Xiao