Re: [RFC] should read(2) update the position if it returns an error?
On Fri, Jun 14, 2013 at 08:38:19AM -0700, Linus Torvalds wrote: > On Fri, Jun 14, 2013 at 1:05 AM, Al Viro wrote: > > > > Comments? I'd obviously prefer to solve it that way (i.e. leave > > ->f_pos untouched if vfs_read() returns an error), but I might be missing > > some case where we want position updated even though read() returns an > > error. I can't come up with one, but then I hadn't RTFS through every > > ->read() instance in drivers in search of weird cases like that - we've > > too many instances ;-/ > > Not updating f_pos on errors sounds like the right thing to do to me, > and if it also ends up fixing some nasty issues with hpfs and > potentially other cases, I'd say "go for it". > > Not for 3.10, though. It's not like this is a new - or acute - problem. Grr... I've just crawled through ->llseek() instances and I really wonder what to do with that crap. Debugfs idiocies had been obvious, but there are much stranger turds floating there. Some random examples: drivers/sbus/char/flash.c: SEEK_CUR beyond the EOF silently trims position to EOF SEEK_SET to the same position just sets it SEEK_END sets position to EOF, period - offset is simply ignored cris eeprom: SEEK_END inverts the sign of offset any seek to negative returns EOVERFLOW (instead of usual EINVAL) *and* sets position to 0 any seek to or beyond the EOF (there's an off-by-one) also returns EOVERFLOW *and* sets position to EOF - 1. Yes, that includes lseek(fd, SEEK_END, 0) carma-fpga-program.c: SEEK_END inverts sign of offset seeks past EOF give EINVAL seeks to negative offsets pass on 32bit and give EINVAL on 64bit drivers/s390/char/vmur.c: fails when offset is not a multiple of 4Kb; the trouble being, read() can bloody well leave you with position not being such a multiple, so that check buys us nothing - SEEK_CUR for 12Kb might be an equivalent of SEEK_SET to 13Kb; the former is allowed, the latter isn't. drivers/staging/vme/devices/vme_user.c: apparent off-by-one - SEEK_END:-1 succeeds, SEEK_END:0 gives EINVAL. Then there's a bunch of drivers that either don't allow SEEK_END while having an upper limit to seekable positions acting as EOF would or allow seeks past what SEEK_END:0 yields. All that mess is a user-visible ABI and at least for the weird eeprom devices I would be very cautious with fixing it - the userland code dealing with those animals is rather obscure, can easily turn out to rely on the idiotic behaviour and brick the boxen if things go wrong. I'm switching everything I can to use of generic_file_llseek_size(); quite a few can be switched, often along with dropping the locking that served only to serialize lseek vs. lseek. The rest is going to be unpleasant. There's another pile of fun around f_pos manipulations - try to run ecryptfs on top of tmpfs or ceph and do seeks on directories. ecryptfs ->readdir() assigns ->f_pos of underlying file and calls underlying ->readdir(); the trouble is, underlying fs has things it wants done on ->f_pos changes (cursor needs to be moved, etc.). ecryptfs on top of hpfs is also a lot of fun, not that anybody had been using that combination. Sigh... -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC] should read(2) update the position if it returns an error?
On Fri, Jun 14, 2013 at 08:38:19AM -0700, Linus Torvalds wrote: On Fri, Jun 14, 2013 at 1:05 AM, Al Viro v...@zeniv.linux.org.uk wrote: Comments? I'd obviously prefer to solve it that way (i.e. leave -f_pos untouched if vfs_read() returns an error), but I might be missing some case where we want position updated even though read() returns an error. I can't come up with one, but then I hadn't RTFS through every -read() instance in drivers in search of weird cases like that - we've too many instances ;-/ Not updating f_pos on errors sounds like the right thing to do to me, and if it also ends up fixing some nasty issues with hpfs and potentially other cases, I'd say go for it. Not for 3.10, though. It's not like this is a new - or acute - problem. Grr... I've just crawled through -llseek() instances and I really wonder what to do with that crap. Debugfs idiocies had been obvious, but there are much stranger turds floating there. Some random examples: drivers/sbus/char/flash.c: SEEK_CUR beyond the EOF silently trims position to EOF SEEK_SET to the same position just sets it SEEK_END sets position to EOF, period - offset is simply ignored cris eeprom: SEEK_END inverts the sign of offset any seek to negative returns EOVERFLOW (instead of usual EINVAL) *and* sets position to 0 any seek to or beyond the EOF (there's an off-by-one) also returns EOVERFLOW *and* sets position to EOF - 1. Yes, that includes lseek(fd, SEEK_END, 0) carma-fpga-program.c: SEEK_END inverts sign of offset seeks past EOF give EINVAL seeks to negative offsets pass on 32bit and give EINVAL on 64bit drivers/s390/char/vmur.c: fails when offset is not a multiple of 4Kb; the trouble being, read() can bloody well leave you with position not being such a multiple, so that check buys us nothing - SEEK_CUR for 12Kb might be an equivalent of SEEK_SET to 13Kb; the former is allowed, the latter isn't. drivers/staging/vme/devices/vme_user.c: apparent off-by-one - SEEK_END:-1 succeeds, SEEK_END:0 gives EINVAL. Then there's a bunch of drivers that either don't allow SEEK_END while having an upper limit to seekable positions acting as EOF would or allow seeks past what SEEK_END:0 yields. All that mess is a user-visible ABI and at least for the weird eeprom devices I would be very cautious with fixing it - the userland code dealing with those animals is rather obscure, can easily turn out to rely on the idiotic behaviour and brick the boxen if things go wrong. I'm switching everything I can to use of generic_file_llseek_size(); quite a few can be switched, often along with dropping the locking that served only to serialize lseek vs. lseek. The rest is going to be unpleasant. There's another pile of fun around f_pos manipulations - try to run ecryptfs on top of tmpfs or ceph and do seeks on directories. ecryptfs -readdir() assigns -f_pos of underlying file and calls underlying -readdir(); the trouble is, underlying fs has things it wants done on -f_pos changes (cursor needs to be moved, etc.). ecryptfs on top of hpfs is also a lot of fun, not that anybody had been using that combination. Sigh... -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC] should read(2) update the position if it returns an error?
On Fri, Jun 14, 2013 at 1:05 AM, Al Viro wrote: > > Comments? I'd obviously prefer to solve it that way (i.e. leave > ->f_pos untouched if vfs_read() returns an error), but I might be missing > some case where we want position updated even though read() returns an > error. I can't come up with one, but then I hadn't RTFS through every > ->read() instance in drivers in search of weird cases like that - we've > too many instances ;-/ Not updating f_pos on errors sounds like the right thing to do to me, and if it also ends up fixing some nasty issues with hpfs and potentially other cases, I'd say "go for it". Not for 3.10, though. It's not like this is a new - or acute - problem. Linus -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[RFC] should read(2) update the position if it returns an error?
We have an unpleasant HPFS (at least) race with read(2) vs. unlink(2) The thing is, HPFS and several other filesystems keep track of all opened struct file for directory and update the position in it upon directory modifications. For HPFS it's particulary painful, since it encodes the location of directory block (times 64 + number of entry in block) as ->f_pos. lseek() validates the offset under i_mutex, directory modifications update ->f_pos of all files (again, under i_mutex) and readdir does all accesses under i_mutex. The trouble with that scheme is sys_read(). There we save the current position, pass the address of local copy to vfs_read() and, once vfs_read() has returned an error, store the value in that local copy back into ->f_pos. The value is unmodified, of course, and everything would be fine if not for the following problem: directory modification done in parallel with that has no idea of that local copy and does *not* update it. It does update ->f_pos, but that update gets reverted as soon as vfs_read() returns and sys_read() does file_pos_write(). It's not just HPFS - HFS, HFS+ and sysfs have a similar (but probably milder) issue. For HPFS it's really nasty - we might end up with subsequent readdir() reading from a completely unrelated directory ;-/ We obviously don't want to grab i_mutex in sys_read(), but I really wonder if we should do file_pos_write() there in case of vfs_read() returning an error. Most of the ->read() instances leave *pos unchanged if they decide to return an error - after all, if we'd consumed some data, the right thing is short read, with whatever errors happening on subsequent reads. I'm not sure if it's mandated by POSIX, but it seems to be what the userland would reasonably expect. And that matches the usual logics with e.g. interruptible waits catching signals in ->read() instances... Comments? I'd obviously prefer to solve it that way (i.e. leave ->f_pos untouched if vfs_read() returns an error), but I might be missing some case where we want position updated even though read() returns an error. I can't come up with one, but then I hadn't RTFS through every ->read() instance in drivers in search of weird cases like that - we've too many instances ;-/ -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[RFC] should read(2) update the position if it returns an error?
We have an unpleasant HPFS (at least) race with read(2) vs. unlink(2) The thing is, HPFS and several other filesystems keep track of all opened struct file for directory and update the position in it upon directory modifications. For HPFS it's particulary painful, since it encodes the location of directory block (times 64 + number of entry in block) as -f_pos. lseek() validates the offset under i_mutex, directory modifications update -f_pos of all files (again, under i_mutex) and readdir does all accesses under i_mutex. The trouble with that scheme is sys_read(). There we save the current position, pass the address of local copy to vfs_read() and, once vfs_read() has returned an error, store the value in that local copy back into -f_pos. The value is unmodified, of course, and everything would be fine if not for the following problem: directory modification done in parallel with that has no idea of that local copy and does *not* update it. It does update -f_pos, but that update gets reverted as soon as vfs_read() returns and sys_read() does file_pos_write(). It's not just HPFS - HFS, HFS+ and sysfs have a similar (but probably milder) issue. For HPFS it's really nasty - we might end up with subsequent readdir() reading from a completely unrelated directory ;-/ We obviously don't want to grab i_mutex in sys_read(), but I really wonder if we should do file_pos_write() there in case of vfs_read() returning an error. Most of the -read() instances leave *pos unchanged if they decide to return an error - after all, if we'd consumed some data, the right thing is short read, with whatever errors happening on subsequent reads. I'm not sure if it's mandated by POSIX, but it seems to be what the userland would reasonably expect. And that matches the usual logics with e.g. interruptible waits catching signals in -read() instances... Comments? I'd obviously prefer to solve it that way (i.e. leave -f_pos untouched if vfs_read() returns an error), but I might be missing some case where we want position updated even though read() returns an error. I can't come up with one, but then I hadn't RTFS through every -read() instance in drivers in search of weird cases like that - we've too many instances ;-/ -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC] should read(2) update the position if it returns an error?
On Fri, Jun 14, 2013 at 1:05 AM, Al Viro v...@zeniv.linux.org.uk wrote: Comments? I'd obviously prefer to solve it that way (i.e. leave -f_pos untouched if vfs_read() returns an error), but I might be missing some case where we want position updated even though read() returns an error. I can't come up with one, but then I hadn't RTFS through every -read() instance in drivers in search of weird cases like that - we've too many instances ;-/ Not updating f_pos on errors sounds like the right thing to do to me, and if it also ends up fixing some nasty issues with hpfs and potentially other cases, I'd say go for it. Not for 3.10, though. It's not like this is a new - or acute - problem. Linus -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/