Re: Read/write counts
On Mon, Jun 04, 2007 at 08:57:16PM +0200, Roman Zippel wrote: > That's the last discussion about signals and I/O I can remember: > http://www.ussg.iu.edu/hypermail/linux/kernel/0208.0/0188.html Well, I think Linus was saying that we have to do both (where the signal interrupts and where it doesn't), and I agree with that: There are enough reasons to discourage people from using uninterruptible sleep ("this f*cking application won't die when the network goes down") that I don't think this is an issue. We need to handle both cases, and ^ while we can expand on the two cases we have now, we can't remove them. ^^^ Fortunately, although the -ERESTARTSYS framework is a little awkward (and people can shoot arrows at me for creating it 15 year ago :-), we do have a way of supporting both styles without _too_ much pain. - Ted - To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Read/write counts
On Mon, Jun 04, 2007 at 08:57:16PM +0200, Roman Zippel wrote: > On Mon, 4 Jun 2007, Theodore Tso wrote: > > > Hmm, I'm not sure I would go that far. Per the POSIX specification, > > we support the optional BSD-style restartable system calls for signals > > which will avoid short reads; but this is only true if SA_RESTART is > > passed to sigaction(). Without SA_RESTART, we will indeed return > > short reads, as required by POSIX. > > > > I don't think Linus has said that short reads are always evil; I > > certainly can't remember him ever making that statement. Do you have > > a pointer to a LKML message where he's said that? > > That's the last discussion about signals and I/O I can remember: > http://www.ussg.iu.edu/hypermail/linux/kernel/0208.0/0188.html He said 'disk read', not 'read(2)'. I'd expect he means certain things like stat(2) and readdir(2) when they have to go to disk. read(2) explicitly lists EINTR as a valid result, and often folks use signals to interrupt read(2). The world certainly writes programs to expect short read(2). Joel -- "Gone to plant a weeping willow On the bank's green edge it will roll, roll, roll. Sing a lulaby beside the waters. Lovers come and go, the river roll, roll, rolls." Joel Becker Principal Software Developer Oracle E-mail: [EMAIL PROTECTED] Phone: (650) 506-8127 - To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Read/write counts
Hi, On Mon, 4 Jun 2007, Theodore Tso wrote: > Hmm, I'm not sure I would go that far. Per the POSIX specification, > we support the optional BSD-style restartable system calls for signals > which will avoid short reads; but this is only true if SA_RESTART is > passed to sigaction(). Without SA_RESTART, we will indeed return > short reads, as required by POSIX. > > I don't think Linus has said that short reads are always evil; I > certainly can't remember him ever making that statement. Do you have > a pointer to a LKML message where he's said that? That's the last discussion about signals and I/O I can remember: http://www.ussg.iu.edu/hypermail/linux/kernel/0208.0/0188.html bye, Roman - To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Read/write counts
On Mon, Jun 04, 2007 at 11:02:23AM -0600, Matthew Wilcox wrote: > On Mon, Jun 04, 2007 at 09:56:07AM -0700, Bryan Henderson wrote: > > Programs that assume a full transfer are fairly common, but are > > universally regarded as either broken or just lazy, and when it does cause > > a problem, it is far more common to fix the application than the kernel. > > Linus has explicitly forbidden short reads from being returned. The > original poster may get away with it for a specialised case, but for > example, signals may not cause a return to userspace with a short read > for exactly this reason. Hmm, I'm not sure I would go that far. Per the POSIX specification, we support the optional BSD-style restartable system calls for signals which will avoid short reads; but this is only true if SA_RESTART is passed to sigaction(). Without SA_RESTART, we will indeed return short reads, as required by POSIX. I don't think Linus has said that short reads are always evil; I certainly can't remember him ever making that statement. Do you have a pointer to a LKML message where he's said that? - Ted - To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Read/write counts
On Mon, Jun 04, 2007 at 09:56:07AM -0700, Bryan Henderson wrote: > Programs that assume a full transfer are fairly common, but are > universally regarded as either broken or just lazy, and when it does cause > a problem, it is far more common to fix the application than the kernel. Linus has explicitly forbidden short reads from being returned. The original poster may get away with it for a specialised case, but for example, signals may not cause a return to userspace with a short read for exactly this reason. - To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Read/write counts
>It is not strictly an error to read/write less than the requested amount, >but you will find that a lot of applications don't handle this correctly. I'd give it a slightly different nuance. It's not an error, and it's a reasonable thing to do, but there is value in not doing it. POSIX and its predecessors back to the beginning of Unix say read()/write() don't have to transfer the full count (they must transfer at least one byte). The main reason for this choice is that it may require more resources (e.g. a memory buffer) than the system can allocate to do the whole request at once. Programs that assume a full transfer are fairly common, but are universally regarded as either broken or just lazy, and when it does cause a problem, it is far more common to fix the application than the kernel. Most application programs access files via libc's fread/fwrite, which don't have partial transfers. GNU libc does handle partial (kernel) reads and writes correctly. I'd be surprised if someone can name a major application that doesn't. -- Bryan Henderson IBM Almaden Research Center San Jose CA Filesystems - To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Read/write counts
On Jun 04, 2007 06:20 -0400, David H. Lynch Jr. wrote: > The net result is that implimentation would be simpler if I could > just read/write, the amount of data that can be done with the least > amount of work, even if that is less than was requested. > > If I receive a request to read 512 bytes, and I return that I have read > 486, is either the OS, libc, or something else going to treat that as an > error, or are they coming back for the rest in a subsequent call ? > > I though I recalled that read()/write() returning a cound less than > requested is not an error. It is not strictly an error to read/write less than the requested amount, but you will find that a lot of applications don't handle this correctly. They will assume that if the amount read/written is != amount requested that this is an error. Of course the opposite is also true - some applications assume that the amount requested == amount read/written and don't even check whether that is actually the case or not. Cheers, Andreas -- Andreas Dilger Principal Software Engineer Cluster File Systems, Inc. - To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Read/write counts
I have a file system that has really odd blocking. All files have a variable length header (basically a directory entry) at their start. Most but not all sectors, have a small fixed length signature as well as some link data at their start. The net result is that implimentation would be simpler if I could just read/write, the amount of data that can be done with the least amount of work, even if that is less than was requested. If I receive a request to read 512 bytes, and I return that I have read 486, is either the OS, libc, or something else going to treat that as an error, or are they coming back for the rest in a subsequent call ? I though I recalled that read()/write() returning a cound less than requested is not an error. - To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html