Re: F_DUPFD_CLOEXEC implementation

2007-10-02 Thread Davide Libenzi
On Tue, 2 Oct 2007, Denys Vlasenko wrote:

> I have following proposals:
> 
> * make recv(..., MSG_DONTWAIT) work on any fd
> 
> Sounds neat, but not trivial to implement in current kernel.

This is mildly ugly, if you ask me. Those are socket functions, and the 
flags parameter contain some pretty specific network meanings.



> * new fcntl command F_DUPFL: fcntl(fd, F_DUPFL, n):
>   Analogous to F_DUPFD, but gives you *unshared* copy of the fd.
>   Further seeks, fcntl(fd, F_SETFL, O_NONBLOCK), etc won't affect
>   any other process.

You'll need an ad-hoc copy function though, since your memcpy-based one is 
gonna explode even before memcpy returns ;) You'll have problems with 
ref-counting too. And that layer is not designed to cleanly support that 
operation.
Unfortunately the "clean" solution would involve changing a whole bunch of 
code, and I don't feel exactly sure it'd be worth it.



- Davide


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: F_DUPFD_CLOEXEC implementation

2007-10-02 Thread Denys Vlasenko
On Monday 01 October 2007 20:04, Davide Libenzi wrote:
> > They don't even need to read in parallel, just having shared fd is enough.
> > Think about pipes, sockets and terminals. A real-world scenario:
> > 
> > * a process started from shell (interactive or shell script)
> > * it sets O_NONBLOCK and does a read from fd 0...
> > * it gets killed (kill -9, whatever)
> > * shell suddenly has it's fd 0 in O_NONBLOCK mode
> > * shell and all subsequent commands started from it unexpectedly have
> >   O_NONBLOCKed stdin.
> 
> I told you how in the previous email. You cannot use the:
> 
> 1) set O_NONBLOCK
> 2) read/write
> 3) unset O_NONBLOCK
> 
> in a racy-free fashion, w/out wrapping it with a lock (thing that we 
> don't want to do).

I'm confused. I am saying exactly this same thing: that I cannot
do it atomically using standard unix operations, but I still need
to do a nonblocking read. Why are you explaining to me that it
cannot be done? I *know*. I'm asking what API should be
added/extended to make it possible.

I have following proposals:

* make recv(..., MSG_DONTWAIT) work on any fd

Sounds neat, but not trivial to implement in current kernel.

* new fcntl command F_DUPFL: fcntl(fd, F_DUPFL, n):
  Analogous to F_DUPFD, but gives you *unshared* copy of the fd.
  Further seeks, fcntl(fd, F_SETFL, O_NONBLOCK), etc won't affect
  any other process.

How hard would it be implement F_DUPFL in current kernel?
--
vda
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: F_DUPFD_CLOEXEC implementation

2007-10-01 Thread Davide Libenzi
On Mon, 1 Oct 2007, Denys Vlasenko wrote:

> On Monday 01 October 2007 19:16, Al Viro wrote:
> > * it's on a bunch of cyclic lists.  Have its neighbor
> > go away while you are doing all that crap => boom
> > * there's that thing call current position...  It gets buggered.
> > * overwriting it while another task might be in the middle of
> > syscall involving it => boom
> 
> Hm, I suspected that it's herecy. Any idea how to do it cleanly?
> 
> > * non-cooperative tasks reading *in* *parallel* from the same
> > opened file are going to have a lot more serious problems than agreeing
> > on O_NONBLOCK anyway, so I really don't understand what the hell is that 
> > for.
> 
> They don't even need to read in parallel, just having shared fd is enough.
> Think about pipes, sockets and terminals. A real-world scenario:
> 
> * a process started from shell (interactive or shell script)
> * it sets O_NONBLOCK and does a read from fd 0...
> * it gets killed (kill -9, whatever)
> * shell suddenly has it's fd 0 in O_NONBLOCK mode
> * shell and all subsequent commands started from it unexpectedly have
>   O_NONBLOCKed stdin.

I told you how in the previous email. You cannot use the:

1) set O_NONBLOCK
2) read/write
3) unset O_NONBLOCK

in a racy-free fashion, w/out wrapping it with a lock (thing that we 
don't want to do).



PS: send/recv are socket functions, and you really don't want to overload 
them for other fds.



- Davide


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: F_DUPFD_CLOEXEC implementation

2007-10-01 Thread Michael Tokarev
Al Viro wrote:
> On Mon, Oct 01, 2007 at 11:07:15AM +0100, Denys Vlasenko wrote:
>> Also attached is ndelaytest.c which can be used to test that
>> send(MSG_DONTWAIT) indeed is failing with EAGAIN if write would block
>> and that other processes never see O_NONBLOCK set.
>>
>> Comments?
> 
> Never send patches during or approaching hangover?
>   * it's on a bunch of cyclic lists.  Have its neighbor
> go away while you are doing all that crap => boom
>   * there's that thing call current position...  It gets buggered.
>   * overwriting it while another task might be in the middle of
> syscall involving it => boom
>   * non-cooperative tasks reading *in* *parallel* from the same
> opened file are going to have a lot more serious problems than agreeing
> on O_NONBLOCK anyway, so I really don't understand what the hell is that for.

Good summary... ;)

But for the last part of the last item - sometimes, definitely more than
once, I wondered why there's no equivalent to recv(MSG_DONTWAIT) for
non-sockets -- why for sockets it's as simple as adding an option (a
single bit), while for all the rest it requires two fcntl calls...
Sometimes it's handy. ;)

Not that I'm arguing for or against such a feature anyway..

/mjt
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: F_DUPFD_CLOEXEC implementation

2007-10-01 Thread Denys Vlasenko
On Monday 01 October 2007 19:16, Al Viro wrote:
>   * it's on a bunch of cyclic lists.  Have its neighbor
> go away while you are doing all that crap => boom
>   * there's that thing call current position...  It gets buggered.
>   * overwriting it while another task might be in the middle of
> syscall involving it => boom

Hm, I suspected that it's herecy. Any idea how to do it cleanly?

>   * non-cooperative tasks reading *in* *parallel* from the same
> opened file are going to have a lot more serious problems than agreeing
> on O_NONBLOCK anyway, so I really don't understand what the hell is that for.

They don't even need to read in parallel, just having shared fd is enough.
Think about pipes, sockets and terminals. A real-world scenario:

* a process started from shell (interactive or shell script)
* it sets O_NONBLOCK and does a read from fd 0...
* it gets killed (kill -9, whatever)
* shell suddenly has it's fd 0 in O_NONBLOCK mode
* shell and all subsequent commands started from it unexpectedly have
  O_NONBLOCKed stdin.
--
vda
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: F_DUPFD_CLOEXEC implementation

2007-10-01 Thread Al Viro
On Mon, Oct 01, 2007 at 11:07:15AM +0100, Denys Vlasenko wrote:
> Also attached is ndelaytest.c which can be used to test that
> send(MSG_DONTWAIT) indeed is failing with EAGAIN if write would block
> and that other processes never see O_NONBLOCK set.
> 
> Comments?

Never send patches during or approaching hangover?
* it's on a bunch of cyclic lists.  Have its neighbor
go away while you are doing all that crap => boom
* there's that thing call current position...  It gets buggered.
* overwriting it while another task might be in the middle of
syscall involving it => boom
* non-cooperative tasks reading *in* *parallel* from the same
opened file are going to have a lot more serious problems than agreeing
on O_NONBLOCK anyway, so I really don't understand what the hell is that for.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: F_DUPFD_CLOEXEC implementation

2007-10-01 Thread Denys Vlasenko
On Monday 01 October 2007 04:15, Davide Libenzi wrote:
> On Mon, 1 Oct 2007, Denys Vlasenko wrote:
> 
> > My use case is: I want to do a nonblocking read on descriptor 0 (stdin).
> > It may be a pipe or a socket.
> > 
> > There may be other processes which share this descriptor with me,
> > I simply cannot know that. And they, too, may want to do reads on it.
> > 
> > I want to do nonblocking read in such a way that neither those other
> > processes will ever see fd switching to O_NONBLOCK and back, and
> > I also want to be safe from other processes doing the same.
> > 
> > I don't see how this can be done using standard unix primitives.
> 
> Indeed. You could simulate non-blocking using poll with zero timeout, but 
> if another task may read/write on it, your following read/write may end up 
> blocking even after a poll returned the required events.
> One way to solve this would be some sort of readx/writex where you pass an 
> extra flags parameter

We have that already. They are called send and recv. ;)

> (this could be done with sys_indirect, assuming  
> we'll ever get that mainline) where you specify the non-blocking 
> requirement for-this-call, and not as global per-file flag. Then, of 
> course, you'll have to modify all the "file->f_flags & O_NONBLOCK" tests 
> (and there are many of them) to check for that flag too (that can be a 
> per task_struct flag).

Attached patch detects send/recv(fd, buf, size, MSG_DONTWAIT) on
non-sockets and turns them into non-blocking write/read.
Since filp->f_flags appear to be read and modified without any locking,
I cannot modify it without potentially affecting other processes
accessing the same file through shared struct file.

Therefore I simply make a temporary copy of struct file, set
O_NONBLOCK in it and pass it to vfs_read/write.
Is this heresy? ;) I see only one spinlock in struct file:

#ifdef CONFIG_EPOLL
spinlock_t  f_ep_lock;
#endif /* #ifdef CONFIG_EPOLL */

Do I need to take it?

Also attached is ndelaytest.c which can be used to test that
send(MSG_DONTWAIT) indeed is failing with EAGAIN if write would block
and that other processes never see O_NONBLOCK set.

Comments?
--
vda
#include 
#include 
#include 
#include 
#include 
#include 
#include 
#include 

#define SECONDS 10

#define STR "."
//#define STR "123456789 123456789 123456789 123456789 "

/* To see send() resulting in EAGAIN:
 * strace -ff -o log ndelaytest | while sleep 11; do break; done
 * log.$PID:
 * send(1, "123456789 123456789 123456789 12"..., 40, MSG_DONTWAIT)
 *= -1 EAGAIN (Resource temporarily unavailable)
 */

int main()
{
	pid_t pid;
	time_t t;
	int fl;

	puts("starting");
	t = time(0);

	pid = fork();
	if (pid == 0) {
		/* child */
		while ((time(0) - t) < SECONDS-1) {
#if 0 
			/* Uncomment this part and simply run the executable
			 * to see race detection code in action */
#define OP "write"
			fcntl(1, F_SETFL, fcntl(1, F_GETFL) | O_NONBLOCK);
			fl = write(1, STR, sizeof(STR) - 1);
			fcntl(1, F_SETFL, fcntl(1, F_GETFL) & ~O_NONBLOCK);
#else
			/* This part tests whether send(MSG_DONTWAIT)
			 * is racy or not */
#define OP "send"
			fl = send(1, STR, sizeof(STR) - 1, MSG_DONTWAIT);
#endif
			if (fl < 0) {
perror(OP);
kill(getppid(), SIGKILL);
return 1;
			}
		}
		return 0;
	}

	while ((time(0) - t) < SECONDS) {
		fl = fcntl(1, F_GETFL);
		if (fl & O_NONBLOCK) {
			fprintf(stderr, "NONBLOCK:1\n");
			kill(pid, SIGKILL);
			fcntl(1, F_SETFL, fl & ~O_NONBLOCK);
			return 1;
		}
	}
	fprintf(stderr, "NONBLOCK:0\n");
	return 0;
}
--- linux-2.6.22-rc6.src/fs/read_write.c	Fri Jun 15 19:30:05 2007
+++ linux-2.6.22-rc6_ndelay/fs/read_write.c	Sun Aug 19 10:43:24 2007
@@ -15,6 +15,7 @@
 #include 
 #include 
 #include 
+#include 
 #include "read_write.h"
 
 #include 
@@ -351,6 +352,36 @@
 static inline void file_pos_write(struct file *file, loff_t pos)
 {
 	file->f_pos = pos;
+}
+
+/* Helper for send/recv on non-sockets */
+ssize_t rw_with_flags(struct file *file, int fput_needed, void __user *buf, size_t count, unsigned flags)
+{
+	int err;
+	loff_t pos;
+	struct file *file_copy;
+
+	file_copy = file;
+	if (flags & MSG_DONTWAIT) {
+		/* We make copy even if O_NONBLOCK is already set. */
+		/* We don't want it to change under our feet. */
+		file_copy = kmalloc(sizeof(*file_copy), GFP_KERNEL);
+		memcpy(file_copy, file, sizeof(*file_copy));
+		file_copy->f_flags |= O_NONBLOCK;
+	}
+
+	pos = file_pos_read(file);
+	if (flags & MSG_OOB) /* MSG_OOB is reused to mean 'write' */
+		err = vfs_write(file_copy, buf, count, &pos);
+	else
+		err = vfs_read(file_copy, buf, count, &pos);
+	file_pos_write(file, pos);
+
+	if (flags & MSG_DONTWAIT) {
+		kfree(file_copy);
+	}
+	fput_light(file, fput_needed);
+	return err;
 }
 
 asmlinkage ssize_t sys_read(unsigned int fd, char __user * buf, size_t count)
--- linux-2.6.22-rc6.src/include/linux/fs.h	Wed Jun 27 21:24:18 2007
+++ linux-2.6.22-rc6_ndelay/include/linux/fs.h	Sun Aug 19 10:32:20 2007
@@ -1154

Re: F_DUPFD_CLOEXEC implementation

2007-09-30 Thread Davide Libenzi
On Mon, 1 Oct 2007, Denys Vlasenko wrote:

> My use case is: I want to do a nonblocking read on descriptor 0 (stdin).
> It may be a pipe or a socket.
> 
> There may be other processes which share this descriptor with me,
> I simply cannot know that. And they, too, may want to do reads on it.
> 
> I want to do nonblocking read in such a way that neither those other
> processes will ever see fd switching to O_NONBLOCK and back, and
> I also want to be safe from other processes doing the same.
> 
> I don't see how this can be done using standard unix primitives.

Indeed. You could simulate non-blocking using poll with zero timeout, but 
if another task may read/write on it, your following read/write may end up 
blocking even after a poll returned the required events.
One way to solve this would be some sort of readx/writex where you pass an 
extra flags parameter (this could be done with sys_indirect, assuming 
we'll ever get that mainline) where you specify the non-blocking 
requirement for-this-call, and not as global per-file flag. Then, of 
course, you'll have to modify all the "file->f_flags & O_NONBLOCK" tests 
(and there are many of them) to check for that flag too (that can be a 
per task_struct flag).



- Davide


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: F_DUPFD_CLOEXEC implementation

2007-09-30 Thread Miquel van Smoorenburg
In article <[EMAIL PROTECTED]>,
Denys Vlasenko  <[EMAIL PROTECTED]> wrote:
>Hi Ulrich,
>
>On Friday 28 September 2007 18:34, Ulrich Drepper wrote:
>> One more small change to extend the availability of creation of
>> file descriptors with FD_CLOEXEC set.  Adding a new command to
>> fcntl() requires no new system call and the overall impact on
>> code size if minimal.
>
>Tangential question: do you have any idea how userspace can
>safely do nonblocking read or write on a potentially-shared fd?
>
>IIUC, currently it cannot be done without races:
>
>old_flags = fcntl(fd, F_GETFL);
>...other process may change flags!...
>fcntl(fd, F_SETFL, old_flags | O_NONBLOCK);
>read(fd, ...)
>...other process may see flags changed under its feet!...
>fcntl(fd, F_SETFL, old_flags);
>
>Can this be fixed?

This is for sockets, right ? Just use revc() instead of read().

n = recv(filedesc, buffer, buflen, MSG_DONTWAIT);

.. is equivalent to setting O_NONBLOCK. See "man recv".

Mike.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: F_DUPFD_CLOEXEC implementation

2007-09-30 Thread Denys Vlasenko
On Monday 01 October 2007 00:11, Davide Libenzi wrote:
> On Sun, 30 Sep 2007, Denys Vlasenko wrote:
> 
> > Hi Ulrich,
> > 
> > On Friday 28 September 2007 18:34, Ulrich Drepper wrote:
> > > One more small change to extend the availability of creation of
> > > file descriptors with FD_CLOEXEC set.  Adding a new command to
> > > fcntl() requires no new system call and the overall impact on
> > > code size if minimal.
> > 
> > Tangential question: do you have any idea how userspace can
> > safely do nonblocking read or write on a potentially-shared fd?
> > 
> > IIUC, currently it cannot be done without races:
> > 
> > old_flags = fcntl(fd, F_GETFL);
> > ...other process may change flags!...
> > fcntl(fd, F_SETFL, old_flags | O_NONBLOCK);
> > read(fd, ...)
> > ...other process may see flags changed under its feet!...
> > fcntl(fd, F_SETFL, old_flags);
> > 
> > Can this be fixed?
> 
> I'm not sure I understood correctly your use case. But, if you have two 
> processes/threads randomly switching O_NONBLOCK on/off, your problems 
> arise not only the F_SETFL time.

My use case is: I want to do a nonblocking read on descriptor 0 (stdin).
It may be a pipe or a socket.

There may be other processes which share this descriptor with me,
I simply cannot know that. And they, too, may want to do reads on it.

I want to do nonblocking read in such a way that neither those other
processes will ever see fd switching to O_NONBLOCK and back, and
I also want to be safe from other processes doing the same.

I don't see how this can be done using standard unix primitives.
--
vda
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: F_DUPFD_CLOEXEC implementation

2007-09-30 Thread Davide Libenzi
On Sun, 30 Sep 2007, Denys Vlasenko wrote:

> Hi Ulrich,
> 
> On Friday 28 September 2007 18:34, Ulrich Drepper wrote:
> > One more small change to extend the availability of creation of
> > file descriptors with FD_CLOEXEC set.  Adding a new command to
> > fcntl() requires no new system call and the overall impact on
> > code size if minimal.
> 
> Tangential question: do you have any idea how userspace can
> safely do nonblocking read or write on a potentially-shared fd?
> 
> IIUC, currently it cannot be done without races:
> 
> old_flags = fcntl(fd, F_GETFL);
> ...other process may change flags!...
> fcntl(fd, F_SETFL, old_flags | O_NONBLOCK);
> read(fd, ...)
> ...other process may see flags changed under its feet!...
> fcntl(fd, F_SETFL, old_flags);
> 
> Can this be fixed?

I'm not sure I understood correctly your use case. But, if you have two 
processes/threads randomly switching O_NONBLOCK on/off, your problems 
arise not only the F_SETFL time.
If one of the tasks is not expecting an fd to be O_NONBLOCK, that will 
likely end up not handling correctly read/write-miss situations.
In that case it'd be better to keep the fd as O_NONBLOCK, and manually 
create blocking behaviour (when needed) with poll+read/write.



- Davide


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: F_DUPFD_CLOEXEC implementation

2007-09-29 Thread Denys Vlasenko
Hi Ulrich,

On Friday 28 September 2007 18:34, Ulrich Drepper wrote:
> One more small change to extend the availability of creation of
> file descriptors with FD_CLOEXEC set.  Adding a new command to
> fcntl() requires no new system call and the overall impact on
> code size if minimal.

Tangential question: do you have any idea how userspace can
safely do nonblocking read or write on a potentially-shared fd?

IIUC, currently it cannot be done without races:

old_flags = fcntl(fd, F_GETFL);
...other process may change flags!...
fcntl(fd, F_SETFL, old_flags | O_NONBLOCK);
read(fd, ...)
...other process may see flags changed under its feet!...
fcntl(fd, F_SETFL, old_flags);

Can this be fixed?
--
vda
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: F_DUPFD_CLOEXEC implementation

2007-09-28 Thread Ulrich Drepper
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

Davide Libenzi wrote:
> I think new system calls would have been a cleaner way to accomplish this. 
> The "small pill at a time" may have better chance to go in, but will 
> likely result in an uglier userspace interface.

We'd need this call anyway since neither dup nor dup2 provides the
functionality of F_DUPFD (but F_DUPFD can be used to implement dup).

For dup2() I will wait until we have a sys_indirect implementation.
I'll try to get this soon.

- --
➧ Ulrich Drepper ➧ Red Hat, Inc. ➧ 444 Castro St ➧ Mountain View, CA ❖
-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.7 (GNU/Linux)

iD8DBQFG/Ua02ijCOnn/RHQRAgOQAKCfQ9H4VYau6nVGuVXyJ7IfBXK+QgCfYQxv
k4esG379v8VBceFIECDybk0=
=dvhX
-END PGP SIGNATURE-
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: F_DUPFD_CLOEXEC implementation

2007-09-28 Thread Davide Libenzi
On Fri, 28 Sep 2007, Ulrich Drepper wrote:

> One more small change to extend the availability of creation of
> file descriptors with FD_CLOEXEC set.  Adding a new command to
> fcntl() requires no new system call and the overall impact on
> code size if minimal.
> 
> If this patch gets accepted we will also add this change to the
> next revision of the POSIX spec.
> 
> To test the patch, use the following little program.  Adjust the
> value of F_DUPFD_CLOEXEC appropriately.

I think new system calls would have been a cleaner way to accomplish this. 
The "small pill at a time" may have better chance to go in, but will 
likely result in an uglier userspace interface.
In any case, this is better than *nothing*, if it makes it easier to use 
fds inside system libraries.



- Davide


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/