file(9) fix
The FRELE() macro described in file(9) as void FRELE(), but actually it has return value and it's return value is used by closef(). Index: share/man/man9/file.9 === RCS file: /home/cvsync/openbsd-cvs/src/share/man/man9/file.9,v retrieving revision 1.12 diff -u -p -r1.12 file.9 --- share/man/man9/file.9 4 Jun 2013 19:27:06 - 1.12 +++ share/man/man9/file.9 30 Apr 2015 02:33:43 - @@ -37,7 +37,7 @@ .Fn fdrelease struct proc *p int fd .Ft void .Fn FREF struct file *fp -.Ft void +.Ft int .Fn FRELE struct file *fp struct proc *p .Ft struct file * .Fn fd_getfile struct filedesc *fdp int fd
replace f_count modifications by FREF() and FRELE()
FREF() and FRELE() should be used for modify file reference count, so direct f_count modification replaced by their calls. Only one direct f_count decrement was kept in closef() since FRELE() call looks inapplicable here. Index: kern/kern_descrip.c === RCS file: /home/cvsync/openbsd-cvs/src/sys/kern/kern_descrip.c,v retrieving revision 1.117 diff -u -p -r1.117 kern_descrip.c --- kern/kern_descrip.c 14 Mar 2015 03:38:50 - 1.117 +++ kern/kern_descrip.c 30 Apr 2015 02:08:52 - @@ -566,7 +566,7 @@ finishdup(struct proc *p, struct file *f fdp-fd_ofiles[new] = fp; fdp-fd_ofileflags[new] = fdp-fd_ofileflags[old] ~UF_EXCLOSE; - fp-f_count++; + FREF(fp); FRELE(fp, p); if (dup2 oldfp == NULL) fd_used(fdp, new); @@ -1001,7 +1001,7 @@ fdcopy(struct process *pr) (*fpp)-f_type == DTYPE_SYSTRACE) fdremove(newfdp, i); else - (*fpp)-f_count++; + FREF(*fpp); } /* finish cleaning up kq bits */ @@ -1075,6 +1075,11 @@ closef(struct file *fp, struct proc *p) if (fp-f_count 2) panic(closef: count (%ld) 2, fp-f_count); #endif + /* +* XXX: The fp has its usecount bumped by FREF() call. FRELE() +* call will not destroy fp here. This direct modification +* kept here until usecount logic will be refactored. +*/ fp-f_count--; /* @@ -1249,7 +1254,7 @@ dupfdopen(struct filedesc *fdp, int indx fdp-fd_ofiles[indx] = wfp; fdp-fd_ofileflags[indx] = (fdp-fd_ofileflags[indx] UF_EXCLOSE) | (fdp-fd_ofileflags[dfd] ~UF_EXCLOSE); - wfp-f_count++; + FREF(wfp); fd_used(fdp, indx); return (0); } Index: kern/uipc_usrreq.c === RCS file: /home/cvsync/openbsd-cvs/src/sys/kern/uipc_usrreq.c,v retrieving revision 1.80 diff -u -p -r1.80 uipc_usrreq.c --- kern/uipc_usrreq.c 28 Mar 2015 23:50:55 - 1.80 +++ kern/uipc_usrreq.c 30 Apr 2015 02:08:52 - @@ -837,7 +837,7 @@ morespace: } memcpy(rp, fp, sizeof fp); rp--; - fp-f_count++; + FREF(fp); fp-f_msgcount++; unp_rights++; } @@ -847,8 +847,8 @@ fail: for ( ; i 0; i--) { rp++; memcpy(fp, rp, sizeof(fp)); - fp-f_count--; fp-f_msgcount--; + FRELE(fp, NULL); unp_rights--; } @@ -962,7 +962,7 @@ unp_gc(void) *fpp++ = fp; nunref++; FREF(fp); - fp-f_count++; + FREF(fp); } } for (i = nunref, fpp = extra_ref; --i = 0; ++fpp) Index: nfs/nfs_syscalls.c === RCS file: /home/cvsync/openbsd-cvs/src/sys/nfs/nfs_syscalls.c,v retrieving revision 1.99 diff -u -p -r1.99 nfs_syscalls.c --- nfs/nfs_syscalls.c 14 Mar 2015 03:38:52 - 1.99 +++ nfs/nfs_syscalls.c 30 Apr 2015 02:08:52 - @@ -276,7 +276,7 @@ nfssvc_addsock(struct file *fp, struct m } slp-ns_so = so; slp-ns_nam = mynam; - fp-f_count++; + FREF(fp); slp-ns_fp = fp; s = splsoftnet(); so-so_upcallarg = (caddr_t)slp;
Re: On github now
On 17 Apr 2015, at 12:49, Martin Pieuchot m...@openbsd.org wrote: On 16/04/15(Thu) 15:24, kanonenvogel@gmail.com wrote: Well, lets begin. In the future, I wish to have fd_getfile() returning acquired fp instance. The goal is to not to have pointer to destroyed fp instance in FREF()/FRELE()/fd_getfile() races. This one requres modification of getsock(), getvnode() and dupfdopen() functions, they must receive pointer to struct proc instance for FRELE() call on referenced fp instance while they have internal error cases. While getsock(), getvnode() and dupfdopen() functions are called, struct proc instance exists, so their struct filedesc * arg can be replaced by struct proc * arg which contains pointer to struct filedesc”. The races will be appeared right after at least one FRELE(), FREF() or fd_getfile() call will be done outside kernel lock. The “outside kernel lock call capability requires a little more refactoring, but for this functions only, not system-wide. Now we have something like: if((fp = fd_getfile(fds, fd)) == NULL) goto error; /* * fp can be destroyed here by FRELE() call on other cpu */ FREF(fp); The goal is to avoid this situation. I came to the same conclusion when I wrote these diffs. I think the first 3 are good and simple enough to be committed, somebody else agree? The fourth one that move FREF() inside fd_getfile() is IMHO incomplete. As you can see I putted some XXX where the existing code was calling fd_getfile() without incrementing the reference count. Why did you decide to delete them? Well, my version of fd_getfile() requires “struct proc *” arg for proper funref() call within. So my original patchset has my own modification of getsock(), getvnode() and dupfdopen(). They has “struct filedesc *” and “struct file *” arguments together, but the only “struct file * is enough. And I used your version. :) At first i modified fd_getfile() internals for acquisition/release races on smp machine. Atomic increment of f_count is required in smp case. Kernel sources has the strange and wrong magic with f_count field. closef() wants fp-f_count to be = 2, and closef()’s callers do this magic for panic prevention. falloc() and FILE_SET_MATURE() do this magic too. Now, newly created “struct file” instance has f_count == 2. It’s wrong, and below I describe situation without this magic. Minimal f_count value of existing “struct file” instance is 1. Minimal f_count value of existing and acquired “struct file” instance is 2. Minimal f_count value of acquired “struct file” instance is 1. “existing” means “exists in “struct filedesc” instance and can be obtained by fd_getfile() call”. “acquired” means “returned by fd_getfile() call”. “struct filedesc” instance holds existing “struct file” instances. fdalloc() creates and places “struct file” instance to “struct filedesc” container fdrelease() removes “struct file” instance from “struct filedesc”, but delete it only if f_count is 1 (fp exists and has no references). fp instance can be shared between multiple processes and can be shared by multiple “struct file” correlated syscalls of one multithreaded process. So, we have races and situation can be: “fp instance” must be removed from “struct filedesc” but kept for current “struct file” instance users. Last of them will destroy it by FRELE() call. If we don’t increment f_count within fd_getfile(), we can return pointer to already destroyed “struct file” instance. For example one thread calls fd_getfile() and other calls fdrelease(). If fd_getfile() caller wins, f_count is at least 1 and instance is not destroyed, but removed from “struct filedesc” container and can’t be obtained by next fd_getfile() call. if fdrelease() wins it removes and destroys “struct file” instance and fd_getfile() will return NULL. the real fucntions must be: struct file * fd_getfile(struct filedesc *fdp, int fd, struct proc *p) { #ifdef MULTIPROCESSOR struct file *fp; unsigned long count; if(fd 0 || fd = fdp-fd_nfiles) return NULL; restart: if ((fp = fdp-fd_ofiles[fd]) == NULL) return NULL; if ((count = fp-f_count) == 0) return NULL; if (atomic_cas_ulong(fp-f_count, count, count + 1) != count) goto restart; if ((fp != fdp-fd_ofiles[fd])) { funref(fp, p); goto restart; } if (!FILE_IS_USABLE(fp)) { funref(fp, p); return (NULL); } return fp; #else struct file *fp; if(fd 0 || fd = fdp-fd_nfiles) return NULL; if ((fp != fdp-fd_ofiles[fd])) return NULL; if (!FILE_IS_USABLE(fp)) return NULL; fp-f_count++; return fp; #endif } /* * FREF() replacement, must be called on * _acquired_ fp only */ static __inline void fref(struct file *fp) { #ifdef
Re: On github now
Well, lets begin. In the future, I wish to have fd_getfile() returning acquired fp instance. The goal is to not to have pointer to destroyed fp instance in FREF()/FRELE()/fd_getfile() races. This one requres modification of getsock(), getvnode() and dupfdopen() functions, they must receive pointer to struct proc instance for FRELE() call on referenced fp instance while they have internal error cases. While getsock(), getvnode() and dupfdopen() functions are called, struct proc instance exists, so their struct filedesc * arg can be replaced by struct proc * arg which contains pointer to struct filedesc”. The races will be appeared right after at least one FRELE(), FREF() or fd_getfile() call will be done outside kernel lock. The “outside kernel lock call capability requires a little more refactoring, but for this functions only, not system-wide. Now we have something like: if((fp = fd_getfile(fds, fd)) == NULL) goto error; /* * fp can be destroyed here by FRELE() call on other cpu */ FREF(fp); The goal is to avoid this situation. Should I checkout CURRENT and patch it or 5.7 is fine too? I attach already exitig patches for git tree. If it required, I'll remake them and send one after another. P.S. I'm not a native speaker and my english is ugly. Sorry. 0001-getsock-api-modification.patch Description: Binary data 0002-getvnode-api-modification.patch Description: Binary data 0003-dupfdopen-api-modification.patch Description: Binary data 0004-fd_getfile-returns-acquired-fp-instance-now.patch Description: Binary data
Re: File protection, second attempt
On 15 Apr 2015, at 19:45, Mike Belopuhov m...@belopuhov.com wrote: On 15 April 2015 at 13:29, kanonenvogel@gmail.com kanonenvogel@gmail.com wrote: On 14 Apr 2015, at 18:35, Mike Belopuhov m...@belopuhov.com wrote: Supposedly you don't have to KERNEL_LOCK for pool_{get,put} anymore. Underlying uvm calls are not mp safe True. and not protected by KERNEL_LOCK() calls. They are, see pool_allocator_alloc. Ok I see. But pool_get() calls msleep() which is quite similar to tlseep() but lacks the 'kernel_lock is locked' assert. Except ktrace related calls those functions look mp safe, and my system didn't crash without this assertion. Assertion KASSERT(timo || __mp_lock_held(kernel_lock)) looks here. Also system has didn't crash with unlocked tsleep() and wakeup() calls. getnanotime() and friends functions are mp safe too. 0001-Wrong-assertion-from-tsleep-removed.patch Description: Binary data 0002-Pipe-locks.patch Description: Binary data
On github now
Well, I’ve put /usr/src/sys subtree from 5.7 with my patches on guthub. I would really like to get it more traction on that. https://github.com/Kanonenvogel/openbsd-sys-5.7-smp
Re: File protection, second attempt
On 14 Apr 2015, at 18:35, Mike Belopuhov m...@belopuhov.com wrote: Supposedly you don't have to KERNEL_LOCK for pool_{get,put} anymore. Underlying uvm calls are not mp safe and not protected by KERNEL_LOCK() calls.
Re: File protection, second attempt
On 12 Apr 2015, at 21:02, Martin Pieuchot m...@openbsd.org wrote: It's only PoF just for to show my hypothetical roadmap. Can you explain what need to be protected from what? 1. Filelist, nfiles and operations with them are protected by rwlock. Why not, could you describe why they need a lock? A diff just to do that would be much easier review. filelist is accessed by falloc(), fdrop(), unp_gc and sysctl_file() functions. After corresponding callers will be unlocked, their calls and filelist modifications/reading become asynchronous, so filelist need to be protected. I want to have read value - or/and - store value chain not be broken by races, so f_iflags and f_flag operations are atomic. 2. Garbage collector's flags FIF_MARK and FIF_DEFER moved from f_iflags to new field f_gc_flags (compatibility with pstat was kept). Why did you decide to split f_iflags in two? What's the problem this move is solving? Here too a diff just to do that would be much easier to review. Those flags exists only for sockets. I suppose they can be moved out from struct file but later. Those flags are modified/accessed in only one mp locked place, so they does'n need any protection now, and their modification doesn't affect on FIF_LARVAL and FIF_HASLOCK flags. 3. Operations over f_iflags are atomic. FIF_LARVAL set only once and FIF_HASLOCK isn't set too frequent. Why are you using atomic operations for these flags? Which scenario that would help? Did you consider any alternative? FIF_LARVAL looks unnecessary, because file can become mature after f_ops field was set. Of course it is just assignment and it can be non-atomic. But now, there are some places, there f_ops methods called before FIF_LARVAL flag is set, so if FIF_LARVAL will gone, they should be rewritten before. FIF_HASLOCK can be removed too. it checked only in vn_closefile(), but this flag doesn’t indicate actual vnode lock state, because VOP_ADVLOCK()’s return value is not checked. May be it can be replaced by new vn_islocked() function, which will check actual vnode lock possibility and lock state? 4. Set/unset operations over f_flag are atomic. They are not too frequent. Does that mean that as soon as the atomic operation has completed all the conditions that a flag represent are true/false? Since the integrity of the code you're changing is currently protected by the biglock the order of the operations inside the functions does not really matter. Setting the flag atomically is in generally not enough and since you're not giving more details it's hard to dig into a huge diff ;) I want only f_flag consistence. mp locked related code must be reviewed later. 5. Operations over f_count are atomic. Surely those are frequent though they can be non atomic on uniprocessor kernel. Why do they need to be atomic and also why are you doing a dance with the file_lock when you cannot increment the counter? fd_getfile_ref() has not any lock inside, so we have races with callers, which modify f_count. I want to have new value, incremented by 1 with previously known non-null value to avoid funref()/fd_gerfile_ref() races. If I can't increment counter, then it has been modified by someone else, so I re-check it's presence in fd_ofiles array, it's reference counter != 0 and if this fp was acqured, I check it's place in fd_ofiles again; fp != fp_ofiles[fd] means fp_ofiles[fd] was modified, fp actually was closed, and I should release it and try to acquire new file with given fd. 6. f_offset field is not protectd now, it should be protected later. From what should it be protected? Async dofilereadv(), dofilewritev() and sys_lseek() calls will modify this field. Also, off_t is 64bit and it should be protected for reading on 32bit architectures. 7. Counters are not protected now, they should be protected later. Same question :) Same answer :) they are 64 bits and modify/read by async calls. Since I've been lurking in that area too recently, you'll find 5 refactoring diffs attached to this email. It's easier for me to send you the work that I did rather than comment on some things. I understand and agree with 0002, 0003, 0004 and 0005. I don't understand, why we need to check flags inside fd_getfile. One more question, did you consider the fact that the code you're modifying might contain some bugs which are currently not exposed because most of it is run under the KERNEL_LOCK? Yes, of course. It can be painful process :)
File protection, second attempt
This is the second attempt of struct file protection. 1. Filelist, nfiles and operations with them are protected by rwlock. 2. Garbage collector's flags FIF_MARK and FIF_DEFER moved from f_iflags to new field f_gc_flags (compatibility with pstat was kept). 3. Operations over f_iflags are atomic. FIF_LARVAL set only once and FIF_HASLOCK isn't set too frequent. 4. Set/unset operations over f_flag are atomic. They are not too frequent. 5. Operations over f_count are atomic. Surely those are frequent though they can be non atomic on uniprocessor kernel. 6. f_offset field is not protectd now, it should be protected later. 7. Counters are not protected now, they should be protected later. For test purpose, I moved out sys_close, sys_closefrom, sys_dup, sys_dup2 and sys_dup3 from kernel lock, except little mp locked fragments. I also moved out sys_read, sys_readv, sys_pread, sys_preadv, sys_write, sys_writev, sys_pwrite and sys_pwritev syscalls from kernel lock (dofilereadv() and dofilewritev() are mp locked, of course) for struct file object capture/release purpose. With this modifications my system rebuilds kernel in multiple instenses simultaneously. Index: arch/i386/i386/linux_machdep.c === RCS file: /cvs/src/sys/arch/i386/i386/linux_machdep.c,v retrieving revision 1.46 diff -u -p -r1.46 linux_machdep.c --- arch/i386/i386/linux_machdep.c 16 Dec 2014 18:30:03 - 1.46 +++ arch/i386/i386/linux_machdep.c 12 Apr 2015 01:27:22 - @@ -440,8 +440,12 @@ linux_machdepioctl(struct proc *p, void com = SCARG(uap, com); fdp = p-p_fd; - if ((fp = fd_getfile(fdp, fd)) == NULL) + fdplock(fdp); + if (fd_getfile(fdp, fd) == NULL) { + fdpunlock(fdp); return (EBADF); + } + fdpunlock(fdp); switch (com) { #if (NWSDISPLAY 0 defined(WSDISPLAY_COMPAT_USL)) @@ -568,12 +572,13 @@ linux_machdepioctl(struct proc *p, void * XXX hack: if the function returns EJUSTRETURN, * it has stuffed a sysctl return value in pt.data. */ - FREF(fp); + if ((fp = fd_getfile_ref(fdp, fd, p)) == NULL) + return (EBADF); ioctlf = fp-f_ops-fo_ioctl; pt.com = SCARG(uap, com); pt.data = SCARG(uap, data); error = ioctlf(fp, PTIOCLINUX, (caddr_t)pt, p); - FRELE(fp, p); + funref(fp, p); if (error == EJUSTRETURN) { retval[0] = (register_t)pt.data; error = 0; Index: compat/linux/linux_blkio.c === RCS file: /cvs/src/sys/compat/linux/linux_blkio.c,v retrieving revision 1.9 diff -u -p -r1.9 linux_blkio.c --- compat/linux/linux_blkio.c 22 Apr 2012 05:43:14 - 1.9 +++ compat/linux/linux_blkio.c 12 Apr 2015 01:28:20 - @@ -70,9 +70,8 @@ linux_ioctl_blkio(struct proc *p, struct struct disklabel label; fdp = p-p_fd; - if ((fp = fd_getfile(fdp, SCARG(uap, fd))) == NULL) + if ((fp = fd_getfile_ref(fdp, SCARG(uap, fd), p)) == NULL) return (EBADF); - FREF(fp); error = 0; ioctlf = fp-f_ops-fo_ioctl; com = SCARG(uap, com); @@ -118,6 +117,6 @@ linux_ioctl_blkio(struct proc *p, struct error = ENOTTY; } - FRELE(fp, p); + funref(fp, p); return error; } Index: compat/linux/linux_cdrom.c === RCS file: /cvs/src/sys/compat/linux/linux_cdrom.c,v retrieving revision 1.11 diff -u -p -r1.11 linux_cdrom.c --- compat/linux/linux_cdrom.c 26 Mar 2014 05:23:42 - 1.11 +++ compat/linux/linux_cdrom.c 12 Apr 2015 01:28:20 - @@ -108,9 +108,8 @@ linux_ioctl_cdrom(p, v, retval) fdp = p-p_fd; - if ((fp = fd_getfile(fdp, SCARG(uap, fd))) == NULL) + if ((fp = fd_getfile_ref(fdp, SCARG(uap, fd), p)) == NULL) return (EBADF); - FREF(fp); if ((fp-f_flag (FREAD | FWRITE)) == 0) { error = EBADF; @@ -293,6 +292,6 @@ linux_ioctl_cdrom(p, v, retval) error = sys_ioctl(p, ia, retval); out: - FRELE(fp, p); + funref(fp, p); return (error); } Index: compat/linux/linux_fdio.c === RCS file: /cvs/src/sys/compat/linux/linux_fdio.c,v retrieving revision 1.7 diff -u -p -r1.7 linux_fdio.c --- compat/linux/linux_fdio.c 22 Apr 2012 05:43:14 - 1.7 +++ compat/linux/linux_fdio.c 12 Apr 2015 01:28:20 - @@ -75,10 +75,9 @@ linux_ioctl_fdio(struct proc *p, struct com = (u_long)SCARG(uap, data); fdp = p-p_fd; - if ((fp = fd_getfile(fdp, SCARG(uap, fd))) == NULL) + if ((fp = fd_getfile_ref(fdp, SCARG(uap, fd), p)) ==
Re: falloc and related stuff
Struct file again. f_flag isn’t modified often, so it’s modifacation can be atomic. f_msgcount and f_rxfer, f_wxfer, f_seek, f_rbytes, f_wbytes can be protected by rwlock. f_offset protection is actual for vnodes only. FIF_MARK and FIF_DEFER flags are used only by unpc garbage collector. This flags can be moved out from f_iflags, for example to f_unpc_flags, and use their own protection. FIF_HASLOCK checked only in vn_closefile(), but this flag doesn’t indicate actual vnode lock state, because VOP_ADVLOCK()’s return value is not checked. May be it can be replaced by new vn_islocked() function, which will check actual vnode lock possibility and lock state? only FIF_LARVAL remains in f_iflags, this flag sets only once, so it’s modification can be atomic. f_count may be modified and checked under rwlock, but I think atomic ops are better on smp, afaik, with uniprocessor kernel simple increment/decrement over volatile variable will be enough. This modification doesn’t break pstat, all related FIF_* flags can be set in kinfo_file struct. f_offset protection can be done like in patch below. it is just proof-of-concept. I think f_offset protection stuff can be moved to external struct, which will be stored in hash with fp address as key. FIF_MARK and FIF_DEFER stuff can be moved to external struct too, I suppose. Index: compat/common/compat_dir.c === RCS file: /cvs/src/sys/compat/common/compat_dir.c,v retrieving revision 1.11 diff -u -p -r1.11 compat_dir.c --- compat/common/compat_dir.c 16 Dec 2014 21:25:28 - 1.11 +++ compat/common/compat_dir.c 9 Apr 2015 10:40:55 - @@ -51,7 +51,6 @@ readdir_with_callback(struct file *fp, o struct iovec aiov; int eofflag = 0; int error, len, reclen; - off_t newoff = *off; struct vnode *vp; struct vattr va; @@ -84,10 +83,16 @@ again: auio.uio_segflg = UIO_SYSSPACE; auio.uio_procp = curproc; auio.uio_resid = buflen; - auio.uio_offset = newoff; - + if (fp-f_offset == off) + foffset_lock(fp, auio.uio_offset); + else + auio.uio_offset = *off; error = VOP_READDIR(vp, auio, fp-f_cred, eofflag); - *off = auio.uio_offset; + if (fp-f_offset == off) + foffset_unlock(fp, auio.uio_offset); + else + *off = auio.uio_offset; + if (error) goto out; Index: kern/kern_descrip.c === RCS file: /cvs/src/sys/kern/kern_descrip.c,v retrieving revision 1.116 diff -u -p -r1.116 kern_descrip.c --- kern/kern_descrip.c 19 Jan 2015 01:19:17 - 1.116 +++ kern/kern_descrip.c 9 Apr 2015 10:41:10 - @@ -61,6 +61,7 @@ #include sys/event.h #include sys/pool.h #include sys/ktrace.h +#include sys/rwlock.h #include sys/pipe.h @@ -451,9 +452,9 @@ restart: if (fl.l_start == 0 fl.l_len 0) { /* lockf(3) compliance hack */ fl.l_len = -fl.l_len; - fl.l_start = fp-f_offset - fl.l_len; + fl.l_start = foffset_get(fp) - fl.l_len; } else - fl.l_start += fp-f_offset; + fl.l_start += foffset_get(fp); } switch (fl.l_type) { @@ -514,9 +515,9 @@ restart: if (fl.l_start == 0 fl.l_len 0) { /* lockf(3) compliance hack */ fl.l_len = -fl.l_len; - fl.l_start = fp-f_offset - fl.l_len; + fl.l_start = foffset_get(fp) - fl.l_len; } else - fl.l_start += fp-f_offset; + fl.l_start += foffset_get(fp); } if (fl.l_type != F_RDLCK fl.l_type != F_WRLCK @@ -869,6 +870,7 @@ restart: */ nfiles++; fp = pool_get(file_pool, PR_WAITOK|PR_ZERO); + rw_init(fp-f_offset_lck, f_offset_lck); fp-f_iflags = FIF_LARVAL; if ((fq = p-p_fd-fd_ofiles[0]) != NULL) { LIST_INSERT_AFTER(fq, fp, f_list); @@ -1125,6 +1127,47 @@ fdrop(struct file *fp, struct proc *p) pool_put(file_pool, fp); return (error); +} + +off_t +foffset_get(struct file *fp) +{ + off_t offset; + + rw_enter_read(fp-f_offset_lck); + offset = fp-f_offset; + rw_exit_read(fp-f_offset_lck); + + return offset; +} + +void +foffset_lock(struct file *fp, off_t *foffset) +{ + KASSERT(foffset != NULL); + rw_enter_write(fp-f_offset_lck); + while (fp-f_offset_lckf FOFFSET_LOCKED) { + fp-f_offset_lckf |= FOFFSET_LOCK_WAITING; +
Re: falloc and related stuff
On 08 Apr 2015, at 02:31, Philip Guenther guent...@gmail.com wrote: On Tue, Apr 7, 2015 at 3:57 PM, Kanonenvogel kanonenvogel@gmail.com wrote: I have idea to modify falloc() function and related logic. Now, after successful faclloc call, we have half-initialized struct file object, protected by FIF_LARVAL flag. I want to initialise struct file object within falloc() and then put it to fd_ofiles array and filehead list. This modification permits to avoid half-initialization state and remove FIF_LARVAL flag and related logic. The hard case is blocking opens of fifos, as the underlying operation involves a change visible to another process and that therefore cannot be rolled back. Philip Guenther Ok, I understood. And what about make struct file more smp friendly? For example, make operations under f_iflags and f_flags atomic? Index: kern/kern_descrip.c === RCS file: /cvs/src/sys/kern/kern_descrip.c,v retrieving revision 1.116 diff -u -p -r1.116 kern_descrip.c --- kern/kern_descrip.c 19 Jan 2015 01:19:17 - 1.116 +++ kern/kern_descrip.c 8 Apr 2015 09:21:19 - @@ -61,6 +61,7 @@ #include sys/event.h #include sys/pool.h #include sys/ktrace.h +#include sys/atomic.h #include sys/pipe.h @@ -332,6 +333,7 @@ sys_fcntl(struct proc *p, void *v, regis int i, tmp, newmin, flg = F_POSIX; struct flock fl; int error = 0; + int oflag, nflag; restart: if ((fp = fd_getfile(fdp, fd)) == NULL) @@ -384,8 +386,12 @@ restart: break; case F_SETFL: - fp-f_flag = ~FCNTLFLAGS; - fp-f_flag |= FFLAGS((long)SCARG(uap, arg)) FCNTLFLAGS; + do { + oflag = fp-f_flag; + nflag = oflag ~FCNTLFLAGS; + nflag |= FFLAGS((long)SCARG(uap, arg)) FCNTLFLAGS; + } while (atomic_cas_uint(fp-f_flag, oflag, nflag) != oflag); + tmp = fp-f_flag FNONBLOCK; error = (*fp-f_ops-fo_ioctl)(fp, FIONBIO, (caddr_t)tmp, p); if (error) @@ -394,7 +400,7 @@ restart: error = (*fp-f_ops-fo_ioctl)(fp, FIOASYNC, (caddr_t)tmp, p); if (!error) break; - fp-f_flag = ~FNONBLOCK; + atomic_clearbits_int(fp-f_flag, FNONBLOCK); tmp = 0; (void) (*fp-f_ops-fo_ioctl)(fp, FIONBIO, (caddr_t)tmp, p); break; @@ -1160,7 +1166,7 @@ sys_flock(struct proc *p, void *v, regis lf.l_len = 0; if (how LOCK_UN) { lf.l_type = F_UNLCK; - fp-f_iflags = ~FIF_HASLOCK; + atomic_clearbits_int(fp-f_iflags, FIF_HASLOCK); error = VOP_ADVLOCK(vp, (caddr_t)fp, F_UNLCK, lf, F_FLOCK); goto out; } @@ -1172,7 +1178,7 @@ sys_flock(struct proc *p, void *v, regis error = EINVAL; goto out; } - fp-f_iflags |= FIF_HASLOCK; + atomic_setbits_int(fp-f_iflags, FIF_HASLOCK); if (how LOCK_NB) error = VOP_ADVLOCK(vp, (caddr_t)fp, F_SETLK, lf, F_FLOCK); else Index: kern/sys_generic.c === RCS file: /cvs/src/sys/kern/sys_generic.c,v retrieving revision 1.96 diff -u -p -r1.96 sys_generic.c --- kern/sys_generic.c 12 Feb 2015 22:27:04 - 1.96 +++ kern/sys_generic.c 8 Apr 2015 09:21:19 - @@ -52,6 +52,7 @@ #include sys/stat.h #include sys/malloc.h #include sys/poll.h +#include sys/atomic.h #ifdef KTRACE #include sys/ktrace.h #endif @@ -456,17 +457,17 @@ sys_ioctl(struct proc *p, void *v, regis case FIONBIO: if ((tmp = *(int *)data) != 0) - fp-f_flag |= FNONBLOCK; + atomic_setbits_int(fp-f_flag, FNONBLOCK); else - fp-f_flag = ~FNONBLOCK; + atomic_clearbits_int(fp-f_flag, FNONBLOCK); error = (*fp-f_ops-fo_ioctl)(fp, FIONBIO, (caddr_t)tmp, p); break; case FIOASYNC: if ((tmp = *(int *)data) != 0) - fp-f_flag |= FASYNC; + atomic_setbits_int(fp-f_flag, FASYNC); else - fp-f_flag = ~FASYNC; + atomic_clearbits_int(fp-f_flag, FASYNC); error = (*fp-f_ops-fo_ioctl)(fp, FIOASYNC, (caddr_t)tmp, p); break; Index: kern/uipc_usrreq.c === RCS file: /cvs/src/sys/kern/uipc_usrreq.c,v retrieving revision 1.79 diff -u -p -r1.79 uipc_usrreq.c --- kern/uipc_usrreq.c 11 Dec 2014 19:21:57 - 1.79 +++ kern/uipc_usrreq.c 8 Apr 2015 09:21:19 - @@ -884,11 +884,11 @@ unp_gc(void) unp_gcing = 1; unp_defer = 0
Re: falloc and related stuff
On 08 Apr 2015, at 17:33, kanonenvogel@gmail.com wrote: Is it a good idea? bad idea because of sys_pread
Re: falloc and related stuff
On 08 Apr 2015, at 17:33, kanonenvogel@gmail.com wrote: Is it a good idea? bad idea because of sys_pread/sys_pwrite
Re: falloc and related stuff
On 08 Apr 2015, at 15:03, Ted Unangst t...@tedunangst.com wrote: The atomic functions are quite expensive on some architectures, so we don't want to just use them everywhere. So, rwlock is better here? Also, can you explain this lines from finishdup() function (openbsd-5., file kern/kern_descrip.c, lines 576-577): fp-f_count++; FRELE(fp, p); it looks useless, or I misunderstood something?
Re: falloc and related stuff
On 08 Apr 2015, at 15:03, Ted Unangst t...@tedunangst.com wrote: Also, this only helps if you're sure that the code reading the flag will do so in an smp safe way. In many cases, the reading code will also need to acquire a lock in order to correctly do something after reading the flag. From the diff context, it looks like most of this code will definitely already have some other lock. What do you think about f_offset protection? Lock file struct object within of_read or fo_write routine? For example for vn_read() int vn_read(struct file *fp, off_t *poff, struct uio *uio, struct ucred *cred) { struct vnode *vp = (struct vnode *)fp-f_data; int error = 0; size_t count = uio-uio_resid; struct proc *p = uio-uio_procp; FILE_LOCK(fp); /* no wrap around of offsets except on character devices */ if (vp-v_type != VCHR count LLONG_MAX - *poff) { FILE_UNLOCK(fp); return (EINVAL); } vn_lock(vp, LK_EXCLUSIVE | LK_RETRY, p); uio-uio_offset = *poff; if (vp-v_type != VDIR) error = VOP_READ(vp, uio, (fp-f_flag FNONBLOCK) ? IO_NDELAY : 0, cred); *poff += count - uio-uio_resid; VOP_UNLOCK(vp, 0, p); FILE_UNLOCK(fp); return (error); } Is it a good idea?
falloc and related stuff
Hello. I have idea to modify falloc() function and related logic. Now, after successful faclloc call, we have half-initialized struct file object, protected by FIF_LARVAL flag. I want to initialise struct file object within falloc() and then put it to fd_ofiles array and filehead list. This modification permits to avoid half-initialization state and remove FIF_LARVAL flag and related logic. After, I want to protect struct file by rwlock, add something like SY_NOLOCK flag and logic and begin to move related syscalls from KERNEL_LOCK. What do you think about?