file(9) fix

2015-04-29 Thread Kanonenvogel
The FRELE() macro described in file(9) as void FRELE(), but actually
it has return value and it's return value is used by closef().

Index: share/man/man9/file.9
===
RCS file: /home/cvsync/openbsd-cvs/src/share/man/man9/file.9,v
retrieving revision 1.12
diff -u -p -r1.12 file.9
--- share/man/man9/file.9   4 Jun 2013 19:27:06 -   1.12
+++ share/man/man9/file.9   30 Apr 2015 02:33:43 -
@@ -37,7 +37,7 @@
 .Fn fdrelease struct proc *p int fd
 .Ft void
 .Fn FREF struct file *fp
-.Ft void
+.Ft int
 .Fn FRELE struct file *fp struct proc *p
 .Ft struct file *
 .Fn fd_getfile struct filedesc *fdp int fd



replace f_count modifications by FREF() and FRELE()

2015-04-29 Thread Kanonenvogel
FREF() and FRELE() should be used for modify file reference count, so
direct f_count modification replaced by their calls. Only one direct f_count
decrement was kept in closef() since FRELE() call looks inapplicable here.

Index: kern/kern_descrip.c
===
RCS file: /home/cvsync/openbsd-cvs/src/sys/kern/kern_descrip.c,v
retrieving revision 1.117
diff -u -p -r1.117 kern_descrip.c
--- kern/kern_descrip.c 14 Mar 2015 03:38:50 -  1.117
+++ kern/kern_descrip.c 30 Apr 2015 02:08:52 -
@@ -566,7 +566,7 @@ finishdup(struct proc *p, struct file *f
 
fdp-fd_ofiles[new] = fp;
fdp-fd_ofileflags[new] = fdp-fd_ofileflags[old]  ~UF_EXCLOSE;
-   fp-f_count++;
+   FREF(fp);
FRELE(fp, p);
if (dup2  oldfp == NULL)
fd_used(fdp, new);
@@ -1001,7 +1001,7 @@ fdcopy(struct process *pr)
(*fpp)-f_type == DTYPE_SYSTRACE)
fdremove(newfdp, i);
else
-   (*fpp)-f_count++;
+   FREF(*fpp);
}
 
/* finish cleaning up kq bits */
@@ -1075,6 +1075,11 @@ closef(struct file *fp, struct proc *p)
if (fp-f_count  2)
panic(closef: count (%ld)  2, fp-f_count);
 #endif
+   /*
+* XXX: The fp has its usecount bumped by FREF() call. FRELE()
+* call will not destroy fp here. This direct modification
+* kept here until usecount logic will be refactored.
+*/
fp-f_count--;
 
/*
@@ -1249,7 +1254,7 @@ dupfdopen(struct filedesc *fdp, int indx
fdp-fd_ofiles[indx] = wfp;
fdp-fd_ofileflags[indx] = (fdp-fd_ofileflags[indx]  UF_EXCLOSE) |
(fdp-fd_ofileflags[dfd]  ~UF_EXCLOSE);
-   wfp-f_count++;
+   FREF(wfp);
fd_used(fdp, indx);
return (0);
 }
Index: kern/uipc_usrreq.c
===
RCS file: /home/cvsync/openbsd-cvs/src/sys/kern/uipc_usrreq.c,v
retrieving revision 1.80
diff -u -p -r1.80 uipc_usrreq.c
--- kern/uipc_usrreq.c  28 Mar 2015 23:50:55 -  1.80
+++ kern/uipc_usrreq.c  30 Apr 2015 02:08:52 -
@@ -837,7 +837,7 @@ morespace:
}
memcpy(rp, fp, sizeof fp);
rp--;
-   fp-f_count++;
+   FREF(fp);
fp-f_msgcount++;
unp_rights++;
}
@@ -847,8 +847,8 @@ fail:
for ( ; i  0; i--) {
rp++;
memcpy(fp, rp, sizeof(fp));
-   fp-f_count--;
fp-f_msgcount--;
+   FRELE(fp, NULL);
unp_rights--;
}
 
@@ -962,7 +962,7 @@ unp_gc(void)
*fpp++ = fp;
nunref++;
FREF(fp);
-   fp-f_count++;
+   FREF(fp);
}
}
for (i = nunref, fpp = extra_ref; --i = 0; ++fpp)
Index: nfs/nfs_syscalls.c
===
RCS file: /home/cvsync/openbsd-cvs/src/sys/nfs/nfs_syscalls.c,v
retrieving revision 1.99
diff -u -p -r1.99 nfs_syscalls.c
--- nfs/nfs_syscalls.c  14 Mar 2015 03:38:52 -  1.99
+++ nfs/nfs_syscalls.c  30 Apr 2015 02:08:52 -
@@ -276,7 +276,7 @@ nfssvc_addsock(struct file *fp, struct m
}
slp-ns_so = so;
slp-ns_nam = mynam;
-   fp-f_count++;
+   FREF(fp);
slp-ns_fp = fp;
s = splsoftnet();
so-so_upcallarg = (caddr_t)slp;



Re: On github now

2015-04-17 Thread kanonenvogel . 87g

On 17 Apr 2015, at 12:49, Martin Pieuchot m...@openbsd.org wrote:

 On 16/04/15(Thu) 15:24, kanonenvogel@gmail.com wrote:
 Well, lets begin.
 
 In the future, I wish to have fd_getfile() returning acquired fp instance.
 The goal is to not to have pointer to destroyed fp instance
 in FREF()/FRELE()/fd_getfile() races. This one requres modification of
 getsock(), getvnode() and dupfdopen() functions, they must receive pointer to
 struct proc instance for FRELE() call on referenced fp instance while they
 have internal error cases. While getsock(), getvnode() and dupfdopen()
 functions are called, struct proc instance exists, so their
 struct filedesc * arg can be replaced by struct proc * arg which contains
 pointer to struct filedesc”.
 
 The races will be appeared right after at least one FRELE(), FREF() or
 fd_getfile() call will be done outside kernel lock. The “outside kernel 
 lock call
 capability requires a little more refactoring, but for this functions only, 
 not
 system-wide.
 
 Now we have something like:
 
 if((fp = fd_getfile(fds, fd)) == NULL)
   goto error;
 
 /*
 * fp can be destroyed here by FRELE() call on other cpu
 */
 
 FREF(fp);
 
 The goal is to avoid this situation.
 
 I came to the same conclusion when I wrote these diffs.  I think the
 first 3 are good and simple enough to be committed, somebody else agree?
 
 The fourth one that move FREF() inside fd_getfile() is IMHO incomplete.
 As you can see I putted some XXX where the existing code was calling 
 fd_getfile() without incrementing the reference count.  Why did you
 decide to delete them?
Well, my version of fd_getfile() requires “struct proc *” arg for proper
funref() call within. So my original patchset has my own modification of
getsock(), getvnode() and dupfdopen(). They has “struct filedesc *” and 
“struct file *” arguments together, but the only “struct file * is enough.
And I used your version. :)

At first i modified fd_getfile() internals for acquisition/release races 
on smp machine. Atomic increment of f_count is required in smp case.

Kernel sources has the strange and wrong magic with f_count field. closef()
wants fp-f_count to be = 2, and closef()’s callers do this magic for panic
prevention. falloc() and FILE_SET_MATURE() do this magic too. Now,
newly created “struct file” instance has f_count == 2. It’s wrong, and below I
describe situation without this magic.

Minimal f_count value of existing “struct file” instance is 1.
Minimal f_count value of existing and acquired “struct file” instance is 2.
Minimal f_count value of acquired “struct file” instance is 1.

“existing” means “exists in “struct filedesc” instance and can be obtained by
fd_getfile() call”.
“acquired” means “returned by fd_getfile() call”. 
“struct filedesc” instance holds existing “struct file” instances.

fdalloc() creates and places “struct file” instance to “struct filedesc” 
container
fdrelease() removes “struct file” instance from “struct filedesc”, but delete it
only if f_count is 1 (fp exists and has no references). fp instance can be
shared between multiple processes and can be shared by multiple
“struct file” correlated syscalls of one multithreaded process. So, we have
races and situation can be: “fp instance” must be removed from
“struct filedesc” but kept for current “struct file” instance users. Last of 
them
will destroy it by FRELE() call. If we don’t increment f_count within 
fd_getfile(),
we can return pointer to already destroyed “struct file” instance. For example
one thread calls fd_getfile() and other calls fdrelease(). If fd_getfile() 
caller
wins, f_count is at least 1 and instance is not destroyed, but removed from
“struct filedesc” container and can’t be obtained by next fd_getfile() call. if
fdrelease() wins it removes and destroys “struct file” instance and fd_getfile()
will return NULL.

the real fucntions must be:

struct file *
fd_getfile(struct filedesc *fdp, int fd, struct proc *p)
{
#ifdef MULTIPROCESSOR
struct file *fp;
unsigned long count;

if(fd 0 || fd = fdp-fd_nfiles)
return NULL;

restart:
if ((fp = fdp-fd_ofiles[fd]) == NULL)
return NULL;
if ((count = fp-f_count) == 0)
return NULL;
if (atomic_cas_ulong(fp-f_count, count, count + 1) != count)
goto restart;
if ((fp != fdp-fd_ofiles[fd])) {
funref(fp, p);
goto restart;
}
if (!FILE_IS_USABLE(fp)) {
funref(fp, p);
return (NULL);
}

return fp;
#else
struct file *fp;

if(fd 0 || fd = fdp-fd_nfiles)
return NULL;
if ((fp != fdp-fd_ofiles[fd]))
return NULL;
if (!FILE_IS_USABLE(fp))
return NULL;
fp-f_count++;

return fp;
#endif
}

/*
* FREF() replacement, must be called on
* _acquired_ fp only
*/
static __inline void
fref(struct file *fp)
{
#ifdef

Re: On github now

2015-04-16 Thread kanonenvogel....@gmail.com
Well, lets begin.

In the future, I wish to have fd_getfile() returning acquired fp instance.
The goal is to not to have pointer to destroyed fp instance
in FREF()/FRELE()/fd_getfile() races. This one requres modification of
getsock(), getvnode() and dupfdopen() functions, they must receive pointer to
struct proc instance for FRELE() call on referenced fp instance while they
have internal error cases. While getsock(), getvnode() and dupfdopen()
functions are called, struct proc instance exists, so their
struct filedesc * arg can be replaced by struct proc * arg which contains
pointer to struct filedesc”.

The races will be appeared right after at least one FRELE(), FREF() or
fd_getfile() call will be done outside kernel lock. The “outside kernel lock 
call
capability requires a little more refactoring, but for this functions only, not
system-wide.

Now we have something like:

if((fp = fd_getfile(fds, fd)) == NULL)
goto error;

/*
 * fp can be destroyed here by FRELE() call on other cpu
 */

FREF(fp);

The goal is to avoid this situation.

Should I checkout CURRENT and patch it or 5.7 is fine too?
I attach already exitig patches for git tree. If it required, I'll
remake them and send one after another.

P.S. I'm not a native speaker and my english is ugly. Sorry.




0001-getsock-api-modification.patch
Description: Binary data


0002-getvnode-api-modification.patch
Description: Binary data


0003-dupfdopen-api-modification.patch
Description: Binary data


0004-fd_getfile-returns-acquired-fp-instance-now.patch
Description: Binary data


Re: File protection, second attempt

2015-04-15 Thread Kanonenvogel

On 15 Apr 2015, at 19:45, Mike Belopuhov m...@belopuhov.com wrote:

 On 15 April 2015 at 13:29, kanonenvogel@gmail.com
 kanonenvogel@gmail.com wrote:
 
 On 14 Apr 2015, at 18:35, Mike Belopuhov m...@belopuhov.com wrote:
 
 Supposedly you don't have to KERNEL_LOCK for pool_{get,put} anymore.
 Underlying uvm calls are not mp safe
 
 True.
 
 and not protected by KERNEL_LOCK() calls.
 
 
 They are, see pool_allocator_alloc.
Ok I see.
But pool_get() calls msleep() which is quite similar to tlseep() but lacks the
'kernel_lock is locked' assert. Except ktrace related calls those functions
look mp safe, and my system didn't crash without this assertion.
Assertion KASSERT(timo || __mp_lock_held(kernel_lock)) looks
 here. Also system has didn't crash with unlocked tsleep()
and wakeup() calls.
getnanotime() and friends functions are mp safe too.



0001-Wrong-assertion-from-tsleep-removed.patch
Description: Binary data


0002-Pipe-locks.patch
Description: Binary data




On github now

2015-04-15 Thread Kanonenvogel
Well, I’ve put /usr/src/sys subtree from 5.7 with my patches on guthub.
I would really like to get it more traction on that.

https://github.com/Kanonenvogel/openbsd-sys-5.7-smp



Re: File protection, second attempt

2015-04-15 Thread kanonenvogel....@gmail.com

On 14 Apr 2015, at 18:35, Mike Belopuhov m...@belopuhov.com wrote:
 
 Supposedly you don't have to KERNEL_LOCK for pool_{get,put} anymore.
Underlying uvm calls are not mp safe and not protected by KERNEL_LOCK() calls.




Re: File protection, second attempt

2015-04-12 Thread Kanonenvogel

On 12 Apr 2015, at 21:02, Martin Pieuchot m...@openbsd.org wrote:
It's only PoF just for to show my hypothetical roadmap.
 Can you explain what need to be protected from what?
 
 1. Filelist, nfiles and operations with them are protected by rwlock.
 
 Why not, could you describe why they need a lock?  A diff just to do
 that would be much easier review.
filelist is accessed by falloc(), fdrop(), unp_gc and sysctl_file() functions. 
After
corresponding callers will be unlocked, their calls and filelist 
modifications/reading
become asynchronous, so filelist need to be protected.

I want to have read value - or/and - store value chain not be broken by
races, so f_iflags and f_flag operations are atomic.
 
 2. Garbage collector's flags FIF_MARK and FIF_DEFER moved from f_iflags to
 new field f_gc_flags (compatibility with pstat was kept).
 
 Why did you decide to split f_iflags in two?  What's the problem this
 move is solving?  Here too a diff just to do that would be much easier
 to review.
Those flags exists only for sockets. I suppose they can be moved out from
struct file but later. Those flags are modified/accessed in only one mp locked
place, so they does'n need any protection now, and their modification doesn't
affect on FIF_LARVAL and FIF_HASLOCK flags.

 3. Operations over f_iflags are atomic. FIF_LARVAL set only once and
 FIF_HASLOCK isn't set too frequent.
 
 Why are you using atomic operations for these flags?  Which scenario
 that would help?   Did you consider any alternative?

FIF_LARVAL looks unnecessary, because file can become mature after
f_ops field was set. Of course it is just assignment and it can be non-atomic.
But now, there are some places, there f_ops methods called before FIF_LARVAL
flag is set, so if FIF_LARVAL will gone, they should be rewritten before.
FIF_HASLOCK can be removed too. it checked only in vn_closefile(), but this flag
doesn’t indicate actual vnode lock state, because VOP_ADVLOCK()’s return value
is not checked. May be it can be replaced by new vn_islocked() function, which 
will
check actual vnode lock possibility and lock state?
 
 4. Set/unset operations over f_flag are atomic. They are not too frequent.
 
 Does that mean that as soon as the atomic operation has completed all the
 conditions that a flag represent are true/false?  Since the integrity of
 the code you're changing is currently protected by the biglock the order
 of the operations inside the functions does not really matter.   Setting
 the flag atomically is in generally not enough and since you're not
 giving more details it's hard to dig into a huge diff ;)
I want only f_flag consistence. mp locked related code must be reviewed later.

 5. Operations over f_count are atomic. Surely those are frequent though they
 can be non atomic on uniprocessor kernel.
 
 Why do they need to be atomic and also why are you doing a dance with
 the file_lock when you cannot increment the counter?
fd_getfile_ref() has not any lock inside, so we have races with callers,
which modify f_count.
I want to have new value, incremented by 1 with previously known non-null
value to avoid funref()/fd_gerfile_ref() races. If I can't increment counter, 
then
it has been modified by someone else, so I re-check it's presence in fd_ofiles 
array,
it's reference counter != 0 and if this fp was acqured, I check it's place in
fd_ofiles again; fp != fp_ofiles[fd] means fp_ofiles[fd] was modified, fp 
actually
was closed, and I should release it and try to acquire new file with given fd.

 6. f_offset field is not protectd now, it should be protected later.
 
 From what should it be protected?
Async dofilereadv(), dofilewritev() and sys_lseek() calls will modify this 
field.
Also, off_t is 64bit and it should be protected for reading on 32bit 
architectures.

 
 7. Counters are not protected now, they should be protected later.
 
 Same question :)
Same answer :) they are 64 bits and modify/read by async calls.

 
 Since I've been lurking in that area too recently, you'll find 5
 refactoring diffs attached to this email.  It's easier for me to
 send you the work that I did rather than comment on some things.
I understand and agree with 0002, 0003, 0004 and 0005.
I don't understand, why we need to check flags inside fd_getfile.

 One more question, did you consider the fact that the code you're
 modifying might contain some bugs which are currently not exposed
 because most of it is run under the KERNEL_LOCK?
Yes, of course. It can be painful process :)




File protection, second attempt

2015-04-12 Thread Kanonenvogel
This is the second attempt of struct file protection.

1. Filelist, nfiles and operations with them are protected by rwlock.
2. Garbage collector's flags FIF_MARK and FIF_DEFER moved from f_iflags to
new field f_gc_flags (compatibility with pstat was kept).
3. Operations over f_iflags are atomic. FIF_LARVAL set only once and
FIF_HASLOCK isn't set too frequent.
4. Set/unset operations over f_flag are atomic. They are not too frequent.
5. Operations over f_count are atomic. Surely those are frequent though they
can be non atomic on uniprocessor kernel.
6. f_offset field is not protectd now, it should be protected later.
7. Counters are not protected now, they should be protected later.

For test purpose, I moved out sys_close, sys_closefrom, sys_dup, sys_dup2
and sys_dup3 from kernel lock, except little mp locked fragments. I also
moved out sys_read, sys_readv, sys_pread, sys_preadv, sys_write, sys_writev,
sys_pwrite and sys_pwritev syscalls from kernel lock (dofilereadv() and
dofilewritev() are mp locked, of course) for struct file object
capture/release purpose. With this modifications my system rebuilds kernel
in multiple instenses simultaneously.

Index: arch/i386/i386/linux_machdep.c
===
RCS file: /cvs/src/sys/arch/i386/i386/linux_machdep.c,v
retrieving revision 1.46
diff -u -p -r1.46 linux_machdep.c
--- arch/i386/i386/linux_machdep.c  16 Dec 2014 18:30:03 -  1.46
+++ arch/i386/i386/linux_machdep.c  12 Apr 2015 01:27:22 -
@@ -440,8 +440,12 @@ linux_machdepioctl(struct proc *p, void 
com = SCARG(uap, com);
 
fdp = p-p_fd;
-   if ((fp = fd_getfile(fdp, fd)) == NULL)
+   fdplock(fdp);
+   if (fd_getfile(fdp, fd) == NULL) {
+   fdpunlock(fdp);
return (EBADF);
+   }
+   fdpunlock(fdp);
 
switch (com) {
 #if (NWSDISPLAY  0  defined(WSDISPLAY_COMPAT_USL))
@@ -568,12 +572,13 @@ linux_machdepioctl(struct proc *p, void 
 * XXX hack: if the function returns EJUSTRETURN,
 * it has stuffed a sysctl return value in pt.data.
 */
-   FREF(fp);
+   if ((fp = fd_getfile_ref(fdp, fd, p)) == NULL)
+   return (EBADF);
ioctlf = fp-f_ops-fo_ioctl;
pt.com = SCARG(uap, com);
pt.data = SCARG(uap, data);
error = ioctlf(fp, PTIOCLINUX, (caddr_t)pt, p);
-   FRELE(fp, p);
+   funref(fp, p);
if (error == EJUSTRETURN) {
retval[0] = (register_t)pt.data;
error = 0;
Index: compat/linux/linux_blkio.c
===
RCS file: /cvs/src/sys/compat/linux/linux_blkio.c,v
retrieving revision 1.9
diff -u -p -r1.9 linux_blkio.c
--- compat/linux/linux_blkio.c  22 Apr 2012 05:43:14 -  1.9
+++ compat/linux/linux_blkio.c  12 Apr 2015 01:28:20 -
@@ -70,9 +70,8 @@ linux_ioctl_blkio(struct proc *p, struct
struct disklabel label;
 
 fdp = p-p_fd;
-   if ((fp = fd_getfile(fdp, SCARG(uap, fd))) == NULL)
+   if ((fp = fd_getfile_ref(fdp, SCARG(uap, fd), p)) == NULL)
return (EBADF);
-   FREF(fp);
error = 0;
ioctlf = fp-f_ops-fo_ioctl;
com = SCARG(uap, com);
@@ -118,6 +117,6 @@ linux_ioctl_blkio(struct proc *p, struct
error = ENOTTY;
}
 
-   FRELE(fp, p);
+   funref(fp, p);
return error;
 }
Index: compat/linux/linux_cdrom.c
===
RCS file: /cvs/src/sys/compat/linux/linux_cdrom.c,v
retrieving revision 1.11
diff -u -p -r1.11 linux_cdrom.c
--- compat/linux/linux_cdrom.c  26 Mar 2014 05:23:42 -  1.11
+++ compat/linux/linux_cdrom.c  12 Apr 2015 01:28:20 -
@@ -108,9 +108,8 @@ linux_ioctl_cdrom(p, v, retval)
 
 
fdp = p-p_fd;
-   if ((fp = fd_getfile(fdp, SCARG(uap, fd))) == NULL)
+   if ((fp = fd_getfile_ref(fdp, SCARG(uap, fd), p)) == NULL)
return (EBADF);
-   FREF(fp);
 
if ((fp-f_flag  (FREAD | FWRITE)) == 0) {
error = EBADF;
@@ -293,6 +292,6 @@ linux_ioctl_cdrom(p, v, retval)
error = sys_ioctl(p, ia, retval);
 
 out:
-   FRELE(fp, p);
+   funref(fp, p);
return (error);
 }
Index: compat/linux/linux_fdio.c
===
RCS file: /cvs/src/sys/compat/linux/linux_fdio.c,v
retrieving revision 1.7
diff -u -p -r1.7 linux_fdio.c
--- compat/linux/linux_fdio.c   22 Apr 2012 05:43:14 -  1.7
+++ compat/linux/linux_fdio.c   12 Apr 2015 01:28:20 -
@@ -75,10 +75,9 @@ linux_ioctl_fdio(struct proc *p, struct 
com = (u_long)SCARG(uap, data);
 
fdp = p-p_fd;
-   if ((fp = fd_getfile(fdp, SCARG(uap, fd))) == NULL)
+   if ((fp = fd_getfile_ref(fdp, SCARG(uap, fd), p)) == 

Re: falloc and related stuff

2015-04-09 Thread kanonenvogel....@gmail.com
Struct file again.

f_flag isn’t modified often, so it’s modifacation can be atomic.
f_msgcount and f_rxfer, f_wxfer, f_seek, f_rbytes, f_wbytes can be protected by 
rwlock. 
f_offset protection is actual for vnodes only.
FIF_MARK and FIF_DEFER flags are used only by unpc garbage collector. This 
flags can
be moved out from f_iflags, for example to f_unpc_flags, and use their own 
protection.
FIF_HASLOCK checked only in vn_closefile(), but this flag doesn’t indicate 
actual vnode lock
state, because VOP_ADVLOCK()’s return value is not checked. May be it can be 
replaced by
 new vn_islocked() function, which will check actual vnode lock possibility and 
lock state?
only FIF_LARVAL remains in f_iflags, this flag sets only once, so it’s 
modification can be atomic.
f_count may be modified and checked under rwlock, but I think atomic ops are 
better on smp,
afaik, with uniprocessor kernel simple increment/decrement over volatile 
variable will be enough.

This modification doesn’t break pstat, all related FIF_* flags can be set in 
kinfo_file struct.  

f_offset protection can be done like in patch below. it is just 
proof-of-concept. I think f_offset 
protection stuff can be moved to external struct, which will be stored in hash 
with fp address as key.

FIF_MARK and FIF_DEFER stuff can be moved to external struct too, I suppose.

Index: compat/common/compat_dir.c
===
RCS file: /cvs/src/sys/compat/common/compat_dir.c,v
retrieving revision 1.11
diff -u -p -r1.11 compat_dir.c
--- compat/common/compat_dir.c  16 Dec 2014 21:25:28 -  1.11
+++ compat/common/compat_dir.c  9 Apr 2015 10:40:55 -
@@ -51,7 +51,6 @@ readdir_with_callback(struct file *fp, o
struct iovec aiov;
int eofflag = 0;
int error, len, reclen;
-   off_t newoff = *off;
struct vnode *vp;
struct vattr va;

@@ -84,10 +83,16 @@ again:
auio.uio_segflg = UIO_SYSSPACE;
auio.uio_procp = curproc;
auio.uio_resid = buflen;
-   auio.uio_offset = newoff;
-
+   if (fp-f_offset == off)
+   foffset_lock(fp, auio.uio_offset);
+   else
+   auio.uio_offset = *off;
error = VOP_READDIR(vp, auio, fp-f_cred, eofflag);
-   *off = auio.uio_offset;
+   if (fp-f_offset == off)
+   foffset_unlock(fp, auio.uio_offset);
+   else
+   *off = auio.uio_offset;
+
if (error)
goto out;
 
Index: kern/kern_descrip.c
===
RCS file: /cvs/src/sys/kern/kern_descrip.c,v
retrieving revision 1.116
diff -u -p -r1.116 kern_descrip.c
--- kern/kern_descrip.c 19 Jan 2015 01:19:17 -  1.116
+++ kern/kern_descrip.c 9 Apr 2015 10:41:10 -
@@ -61,6 +61,7 @@
 #include sys/event.h
 #include sys/pool.h
 #include sys/ktrace.h
+#include sys/rwlock.h
 
 #include sys/pipe.h
 
@@ -451,9 +452,9 @@ restart:
if (fl.l_start == 0  fl.l_len  0) {
/* lockf(3) compliance hack */
fl.l_len = -fl.l_len;
-   fl.l_start = fp-f_offset - fl.l_len;
+   fl.l_start = foffset_get(fp) - fl.l_len;
} else
-   fl.l_start += fp-f_offset;
+   fl.l_start += foffset_get(fp);
}
switch (fl.l_type) {
 
@@ -514,9 +515,9 @@ restart:
if (fl.l_start == 0  fl.l_len  0) {
/* lockf(3) compliance hack */
fl.l_len = -fl.l_len;
-   fl.l_start = fp-f_offset - fl.l_len;
+   fl.l_start = foffset_get(fp) - fl.l_len;
} else
-   fl.l_start += fp-f_offset;
+   fl.l_start += foffset_get(fp);
}
if (fl.l_type != F_RDLCK 
fl.l_type != F_WRLCK 
@@ -869,6 +870,7 @@ restart:
 */
nfiles++;
fp = pool_get(file_pool, PR_WAITOK|PR_ZERO);
+   rw_init(fp-f_offset_lck, f_offset_lck);
fp-f_iflags = FIF_LARVAL;
if ((fq = p-p_fd-fd_ofiles[0]) != NULL) {
LIST_INSERT_AFTER(fq, fp, f_list);
@@ -1125,6 +1127,47 @@ fdrop(struct file *fp, struct proc *p)
pool_put(file_pool, fp);
 
return (error);
+}
+
+off_t
+foffset_get(struct file *fp)
+{
+   off_t offset;
+
+   rw_enter_read(fp-f_offset_lck);
+   offset = fp-f_offset;
+   rw_exit_read(fp-f_offset_lck);
+   
+   return offset;
+}
+
+void
+foffset_lock(struct file *fp, off_t *foffset)
+{
+   KASSERT(foffset != NULL);
+   rw_enter_write(fp-f_offset_lck);
+   while (fp-f_offset_lckf  FOFFSET_LOCKED) {
+   fp-f_offset_lckf |= FOFFSET_LOCK_WAITING;
+   

Re: falloc and related stuff

2015-04-08 Thread kanonenvogel....@gmail.com

On 08 Apr 2015, at 02:31, Philip Guenther guent...@gmail.com wrote:

 On Tue, Apr 7, 2015 at 3:57 PM, Kanonenvogel kanonenvogel@gmail.com 
 wrote:
 I have idea to modify falloc() function and related logic.
 Now, after successful faclloc call,  we have half-initialized struct file 
 object, protected by FIF_LARVAL flag.
 I want to initialise struct file object within falloc() and then put it to 
 fd_ofiles array and filehead list. This modification
 permits to avoid half-initialization state and remove FIF_LARVAL flag and 
 related logic.
 
 The hard case is blocking opens of fifos, as the underlying operation
 involves a change visible to another process and that therefore cannot
 be rolled back.
 
 
 Philip Guenther
Ok, I understood.
And what about make struct file more smp friendly? For example, make operations 
under
f_iflags and f_flags atomic?

Index: kern/kern_descrip.c
===
RCS file: /cvs/src/sys/kern/kern_descrip.c,v
retrieving revision 1.116
diff -u -p -r1.116 kern_descrip.c
--- kern/kern_descrip.c 19 Jan 2015 01:19:17 -  1.116
+++ kern/kern_descrip.c 8 Apr 2015 09:21:19 -
@@ -61,6 +61,7 @@
 #include sys/event.h
 #include sys/pool.h
 #include sys/ktrace.h
+#include sys/atomic.h
 
 #include sys/pipe.h
 
@@ -332,6 +333,7 @@ sys_fcntl(struct proc *p, void *v, regis
int i, tmp, newmin, flg = F_POSIX;
struct flock fl;
int error = 0;
+   int oflag, nflag;
 
 restart:
if ((fp = fd_getfile(fdp, fd)) == NULL)
@@ -384,8 +386,12 @@ restart:
break;
 
case F_SETFL:
-   fp-f_flag = ~FCNTLFLAGS;
-   fp-f_flag |= FFLAGS((long)SCARG(uap, arg))  FCNTLFLAGS;
+   do {
+   oflag = fp-f_flag;
+   nflag = oflag  ~FCNTLFLAGS;
+   nflag |= FFLAGS((long)SCARG(uap, arg))  FCNTLFLAGS;
+   } while (atomic_cas_uint(fp-f_flag, oflag, nflag) != oflag);
+
tmp = fp-f_flag  FNONBLOCK;
error = (*fp-f_ops-fo_ioctl)(fp, FIONBIO, (caddr_t)tmp, p);
if (error)
@@ -394,7 +400,7 @@ restart:
error = (*fp-f_ops-fo_ioctl)(fp, FIOASYNC, (caddr_t)tmp, p);
if (!error)
break;
-   fp-f_flag = ~FNONBLOCK;
+   atomic_clearbits_int(fp-f_flag, FNONBLOCK);
tmp = 0;
(void) (*fp-f_ops-fo_ioctl)(fp, FIONBIO, (caddr_t)tmp, p);
break;
@@ -1160,7 +1166,7 @@ sys_flock(struct proc *p, void *v, regis
lf.l_len = 0;
if (how  LOCK_UN) {
lf.l_type = F_UNLCK;
-   fp-f_iflags = ~FIF_HASLOCK;
+   atomic_clearbits_int(fp-f_iflags, FIF_HASLOCK);
error = VOP_ADVLOCK(vp, (caddr_t)fp, F_UNLCK, lf, F_FLOCK);
goto out;
}
@@ -1172,7 +1178,7 @@ sys_flock(struct proc *p, void *v, regis
error = EINVAL;
goto out;
}
-   fp-f_iflags |= FIF_HASLOCK;
+   atomic_setbits_int(fp-f_iflags, FIF_HASLOCK);
if (how  LOCK_NB)
error = VOP_ADVLOCK(vp, (caddr_t)fp, F_SETLK, lf, F_FLOCK);
else
Index: kern/sys_generic.c
===
RCS file: /cvs/src/sys/kern/sys_generic.c,v
retrieving revision 1.96
diff -u -p -r1.96 sys_generic.c
--- kern/sys_generic.c  12 Feb 2015 22:27:04 -  1.96
+++ kern/sys_generic.c  8 Apr 2015 09:21:19 -
@@ -52,6 +52,7 @@
 #include sys/stat.h
 #include sys/malloc.h
 #include sys/poll.h
+#include sys/atomic.h
 #ifdef KTRACE
 #include sys/ktrace.h
 #endif
@@ -456,17 +457,17 @@ sys_ioctl(struct proc *p, void *v, regis
 
case FIONBIO:
if ((tmp = *(int *)data) != 0)
-   fp-f_flag |= FNONBLOCK;
+   atomic_setbits_int(fp-f_flag, FNONBLOCK);
else
-   fp-f_flag = ~FNONBLOCK;
+   atomic_clearbits_int(fp-f_flag, FNONBLOCK);
error = (*fp-f_ops-fo_ioctl)(fp, FIONBIO, (caddr_t)tmp, p);
break;
 
case FIOASYNC:
if ((tmp = *(int *)data) != 0)
-   fp-f_flag |= FASYNC;
+   atomic_setbits_int(fp-f_flag, FASYNC);
else
-   fp-f_flag = ~FASYNC;
+   atomic_clearbits_int(fp-f_flag, FASYNC);
error = (*fp-f_ops-fo_ioctl)(fp, FIOASYNC, (caddr_t)tmp, p);
break;
 
Index: kern/uipc_usrreq.c
===
RCS file: /cvs/src/sys/kern/uipc_usrreq.c,v
retrieving revision 1.79
diff -u -p -r1.79 uipc_usrreq.c
--- kern/uipc_usrreq.c  11 Dec 2014 19:21:57 -  1.79
+++ kern/uipc_usrreq.c  8 Apr 2015 09:21:19 -
@@ -884,11 +884,11 @@ unp_gc(void)
unp_gcing = 1;
unp_defer = 0

Re: falloc and related stuff

2015-04-08 Thread kanonenvogel . 87g

On 08 Apr 2015, at 17:33, kanonenvogel@gmail.com wrote:
 
 Is it a good idea?

bad idea because of sys_pread



Re: falloc and related stuff

2015-04-08 Thread kanonenvogel....@gmail.com

On 08 Apr 2015, at 17:33, kanonenvogel@gmail.com wrote:
 
 Is it a good idea?

bad idea because of sys_pread/sys_pwrite



Re: falloc and related stuff

2015-04-08 Thread kanonenvogel....@gmail.com

On 08 Apr 2015, at 15:03, Ted Unangst t...@tedunangst.com wrote:

 The atomic functions are quite expensive on
 some architectures, so we don't want to just use them everywhere.
So, rwlock is better here?

Also, can you explain this lines from finishdup() function (openbsd-5., file 
kern/kern_descrip.c, lines 576-577):
fp-f_count++;
FRELE(fp, p);

it looks useless, or I misunderstood something?




Re: falloc and related stuff

2015-04-08 Thread kanonenvogel....@gmail.com

On 08 Apr 2015, at 15:03, Ted Unangst t...@tedunangst.com wrote:
 Also, this only helps if you're sure that the code reading the flag will do so
 in an smp safe way. In many cases, the reading code will also need to acquire
 a lock in order to correctly do something after reading the flag. From the
 diff context, it looks like most of this code will definitely already have
 some other lock.
What do you think about f_offset protection? Lock file struct object within 
of_read or fo_write routine?
For example for vn_read()

int
vn_read(struct file *fp, off_t *poff, struct uio *uio, struct ucred *cred)
{
struct vnode *vp = (struct vnode *)fp-f_data;
int error = 0;
size_t count = uio-uio_resid;
struct proc *p = uio-uio_procp;

FILE_LOCK(fp);
/* no wrap around of offsets except on character devices */
if (vp-v_type != VCHR  count  LLONG_MAX - *poff) {
FILE_UNLOCK(fp);
return (EINVAL);
}
vn_lock(vp, LK_EXCLUSIVE | LK_RETRY, p);
uio-uio_offset = *poff;
if (vp-v_type != VDIR)
error = VOP_READ(vp, uio,
(fp-f_flag  FNONBLOCK) ? IO_NDELAY : 0, cred);
*poff += count - uio-uio_resid;
VOP_UNLOCK(vp, 0, p);
FILE_UNLOCK(fp);
return (error);
}

Is it a good idea?





falloc and related stuff

2015-04-07 Thread Kanonenvogel
Hello.
I have idea to modify falloc() function and related logic.
Now, after successful faclloc call,  we have half-initialized struct file 
object, protected by FIF_LARVAL flag.
I want to initialise struct file object within falloc() and then put it to 
fd_ofiles array and filehead list. This modification
permits to avoid half-initialization state and remove FIF_LARVAL flag and 
related logic.
After, I want to protect struct file by rwlock, add something like SY_NOLOCK 
flag and logic and begin
to move related syscalls from KERNEL_LOCK. What do you think about?